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Our goal is to assure that our citizens know enough about science so thai they: 

can tell the difference between sense and nonsense, between science and pseudosciencc 

can distinguish the possible from the impossible, the probable from the improbable 

can understand both the powers and limits of science and technology 

arc not at the mercy of c.\pcrts--or worse, of charlatans posing as experts 

can be participants, not victims, in our increasingly and irreversible technological 
society. 



The following entries represent current Test Center holdings in the area of alternative assessment 
ideas for science. "Alternative/ 1 for this purpose, means "other than standardized, norm- 
referenced." The list emphasizes performance assessments, portfolios, technological innovations, 
etc. Some of the entries may he intended for informal, classroom use. For more information, 
contact Matthew Whitaker, Test Center Clerk, at (503) 275-9582, Northwest Regional 
Educational Laboratory, 101 SW Main Street, Suite 500, Portland, Or 6 on 97204, e-mail: 
testcenter@nwrel .org. To purchase a copy of this bibliography, please call NWREL's Document 
Reproduction Service at (503) 275-9519. 

In this bibliography, terms are used in the following way: open-response = tasks with only one 
right answer; open-ended = having more than one right answer; holistic rubric = one score based 
on overall impression; analytical trail rubric = performance judged along several dimensions; task 
specific rubric = rubric tailored for a specific task; and generalized rubric = aibric used across 
tasks. 



(David Saxon. Massachusetts Institute of Technology, February 17, 
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Abraham, Michael R, Eileen Bross Grzybowski, John W. Renner, et ah Understandings 
and Misunderstandings of Eighth Graders of Five Chemistry Concepts Found in 
Textbooks. Located in: Journal of Research in Science Teaching 29, 1992, pp. 105-120. 

The study reported in this paper looked at how well grade eight students understand five 
concepts in chemistry: chemical change, dissolution, conservation of atoms, periodicity, and 
phase-change. There are five problems, one associated with each concept. Each problem 
describes (and/or shows) a problem situation and asks one to three questions. Some questions 
require short answers and some require explanations of answers. Each response is scored on 
a six-point scale of conceptual understanding from "no response" to "sound understanding" of 
the concept. The paper gave some examples of misunderstandings shown by the students 

The authors found that very few students really understood the concepts. They speculate that 
this may either be due to the nature of instruction (mostly textbook driven and little hands-on) 
or because students aie not developmental^ ready for the formal logic found in these 
concepts. The paper also reports some information on student status and the relationship 
between scores on this test and another measure of formal logical thinking. 

A related study using the same five tasks is A Cross-Age Study of the ( Jnderr.tanding of Five 
Chemistry Concepts by Michael R. Abraham, Vickie M. Williamson, and Susan Westbrook 
(TC#600.3CROAGS). Available from: The Department of Chemistry and Biochemistry, 
University of Oklahoma, 620 Parrington Oval., Norman, OK 73019. 

(TC#650.3UNDMIE) 



Alberta Education. Diploma Examinations Program— Chemistry 30, Physics 30, Biology 3th 
1991. Available from: Learning Resources Distributing Centre, 12360 - 142 St., 
Edmonton, AB T5L 4X9, Canada, (403) 427-2767, fax (403) 422-9750. 

Alberta Education develops high school diploma examinations in several course areas. These, 
combined with school-awarded "marks" are used to assign credit for the courses. We have 
received the 1991 versions of the exams for Chemistry 30, Physics 30, and Biology 30 There 
are three types of questions: multiple-choice, "numerical response" (students "bubble" their 
answers onto the scan sheet), and written response. All three tests have multiple-choice. The 
other two formats differ between tests. 

All tests appear to focus on subject area knowledge. (Rather than problem solving, 
communication, reasoning, science process skills, etc.) Examinations are given locally under 
controlled conditions. Papers are scored centrally. Scoring appears to be based on the 
correctness of the answer. 

(TC# 600.3DIPEXP) 
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Alberta Education. Evaluating Students 9 Learning and Communication Processes, 

January 1993-January 1994, Available from: The Learning Resources Distributing 
Centre, 12360 - 142 St., Edmonton, AB T5L 4X9, Canada, (403) 427-2767, 
fax (403) 422-9750. 

The goals of the Evaluating Students ' Learning and Communication Processes program are 
to: ( 1 ) evaluate progress of secondary students (grades 7-10) in six learning and 
communication processes; (2) integrate the six processes across classes in language arts, 
social studies, and science; and (3) empower students to take control of learning by making 
them conscious of the six process skills and how they, themselves, use them. It is based on 
the premise that students' achievement is directly related to the extent to which they have 
conscious, independent control over essential learning and communication processes. The six 
learning and communication processes are. exploring, narrating, imagining, empathizing 
(understanding the perspectives of others), abstracting (create, support, apply and evaluate 
generalizations), and monitoring. The materials provide generalized performance criteria 
(indicators) that serve both to define each process skill and to provide a mechanism for 
judging the quality of student use of the skill, regardless of the area in which they are working. 

There is a general handbook for all subject areas that covers evaluation (performance criteria 
and recording information) and instruction (how to implement the program, instructional 
activities for students, help with student self-reflection, help with teacher collaboration, and 
how to report student progress). There is a separate handbook for each subject area that 
contains sample teaching units designed to show teachers how to incorporate diagnot,^ 
evaluation of students' learning and communication processes into regular instruction. In 
science the diagnostic teaching units are in the areas of structures/design for grade 7 and 
acids/bases for grade 10. 

The documents give a good rationale for the importance of the six process skills and the 
importance of student self-monitoring of the processes. They also give extremely good advice 
on how to design instructional tasks that require students to use the six process skills, how to 
use instructional tasks as a context for student self-monitoring of process skills, and how to 
evaluate progress on these skills. The documents are also very useful because they have 
attempted to define process skills and apply them across subject matter areas. No technical 
information is provided. Some sample student work is provided. 



(TC# 600.3EVASTL) 



Appalachia Educational Laboratory. Alternative Assessments in Math and Science: Moving 
Toward a Moving Target, 1992. Available from: Appalachia Educational Laboratory, 
PO Box 1348, Charleston, WV 25325, (304) 347-0400. 

This document reports on a two-year study by the Virginia Education Association and the 
Appalachia Educational Laboratory. In the study, 1 1 pairs of K-12 science and math teachers 
designed and implemented new methods of evaluating student competence and application of 
knowledge 
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Teachers who participated in the study found that the changes in assessment methods led to 
changes in their teaching methods, improvements in student learning and better student 
attitudes. Instruction became more integrated across subjects and shifted from being teacher- 
driven to being student-driven. Teachers acted more as facilitators of learning rather than 
dispensers of information. 

Included in the report is a list of recommendations for implementing alternative assessments, a 
list of criteria for effective assessment, and 22 sample activities (with objectives, tasks, and 
scoring guidelines) for elementary, middle, and high school students, all designed and tested 
by the teachers in the study. Most activities have performance criteria that are holistic and 
task specific. No technical information or sample student work is included. 

(TC#600.3ALTASM) 

Arter, Judith A. Integrating Assessment and Instruction, 1994. Available from: Northwest 
Regional Educational Laboratory, 101 SW Main St., Suite 500, Portland, OR 97204, 
(503) 275-9582, fax: (503) 275-9489. 

Although not strictly about science assessment, this paper is included because of its discussion 
of how, if designed properly, performance assessments can be used as tools for learning in the 
classroom as well as tools for monitoring student progress. 

(TC# 150.6INTASI) 

Arter, Judith A. Performance Criteria: The Heart of the Matter, 1994. Available from: 
Northwest Regional Educational Laboratory, 101 SW Main St., Suite 500, Portland, 
OR 97204, (503) 275-9582, fax: (503) 275-9489. 

Although not directly related to science assessment, this paper discusses an important issue 
that pertains to performance assessment in general-the need for clear and well thought out 
scoring mechanisms. The paper discusses what performance criteria are, the importance of 
good quality performance criteria, how to develop performance criteria, and keys to success. 
The author argues for generalized, analytical trait performance criteria that cover all important 
aspects of a performance and are descriptive. 

(TO 150.6PERCRH) 

Aurora Public Schools. Performance Assessments in Science and Mathematics, 1994. 
Available from: Strategic Plan Facilitator, Aurora Public Schools, Department of 
Instructional Services, 15751 E. 1st Ave., Suite 220, Aurora, CO 8001 1, (303) 340-0861, 
fax: (303) 340-0865. 

The author has provided three examples of the types of assessments being developed by 
teachers in Aurora Public Schools developing an analogy for the major anatomical and 
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physiological components of a typical eukaryotic cell, recommending a decision concerning 
the future use of medical technology in human biology, and collecting and analyzing a data 
set These examples, for secondary students, include a description of the task, prerequisite 
student experiences, and criteria forjudging student performance on the task. Students work 
in groups of two to four. The assessments are mostly for classroom use. 

Performances are evaluated along several dimensions including content, complex thinking, and 
collaborative working. Most of the rubrics are task specific and emphasize relative quality. 
For example, a "4 %1 score for complex thinking on the medical technology task is: "The 
student clearly and completely identified the criteria by which the alternatives were assessed. 
The criteria were presented in detail and reflected an unusually thorough understanding and 
concern for the repercussions of the decision . 1 ' The collaborative worker rubric is generic and 
more descriptive, a "4** is "The student expressed ideas clearly and effectively; listened 
actively to the ideas of others; made a consistent effort to ensure that ideas were clearly and 
commonly understood; accurately analyzed verbal and non-verbal communications; solicited 
and showed respect for the opinions of others.'' 

No technical information nor sample student responses are included, 

<TC# 000.3SCIMAP) 

Badger, Elizabeth and Brenda Thomas* On Their Own: Student Response to Open-Ended 
Tasks in Mathematics, 1 989-9 K Available from: The Commonwealth of Massachusetts, 
Department of Education, 1385 Hancock St., Quincy, MA 02169, (617) 770-7334. 

The materials we received contain assessment materials for grades 4, 8 and 12 from three 
years ( 1 988- 1 990) in four subject areas: science, math, social studies and reading. This entry 
describes the science portion of the materials 

In the 1988 and 1990 materials, students were given a written problem in which they had to 
apply concepts of experimental design, or use concepts in life or physical sciences, to explain a 
phenomenon. In 1 988. three problems were given to fourth graders, six problems to eighth 
graders, and seven problems to twelfth graders. In 1990, three problems were given to fourth 
graders, and four were given to eighth and twelfth graders. Some of these were repeated 
across grade levels. All problems are included in this document. Responses were analyzed for • 
the ability to note important aspects of designing an experiment or the amount of 
understanding of concepts they displayed. No specific performance criteria or scoring 
procedures are provided. However, there is extensive discussion of what students did, 
illustrated by sample responses. Because some of the information was also presented in 
multiple-choice format, the state was able to conclude that "although students appear to know 
and recognize the rules and principles of scientific inquiry when presented as sta; ' options, 
unstructured situations that demand an application of these principles seem to baffle them." 

In 1989, a sample of 2,000 students was assigned one of seven performance tasks (the three in 
science required lab equipment and/or manipulates) to do in pairs. Each pair was 
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individually watched by an evaluates Each evaluator could observe between six and ten pairs 
each day. It took 65 evaluators five days to observe the 2,000 performances. Evaluators 
were to both check off those things that students did correctly (e.g., measured temperature 
correctly), and record observations of students 1 conversations and strategies. Again, detailed 
scoring procedures are not provided. There is, again, much discussion of observations 
illustrated by samples of student responses. 

Some information about results for all the assessments is provided: percentages of students 
getting correct answers, using various strategies, using efficient methods, giving good 
explanations, etc., depending on the task. No technical information about the tests themselves 
is provided. 

(TC#600.3ONTHOS) 



Baker, Eva L M Pamela R. Aschbacher, David Niemi, et al. CRESST Performance 

Assessment Models: Assessing Content Area Explanations, April 1992. Available from: 
National Center for Research on Evaluation. Standards, and Student Testing 
(CRESST), Center for the Study of Evaluation, UCLA Graduate School of Education, 
145 Moore Hall, Los Angeles, CA 90024, (310) 206-1532, fax (310) 825-3883. 

The authors provide two detailed examples of performance assessments for high school 
students— history and chemistry. In addition to these two specific examples, the document 
includes help on duplicating the technique with other subject matter areas, including rater 
training, scoring techniques, and methods for reporting results. The general procedures 
include: a Prior Knowledge Measure which assesses (and activates) students' general and 
topic-relevant knowledge; provision of primary-source/written-background materials; a 
writing task in which students integrate prior and new knowledge to explain subject matter 
issues in responses to prompts; and a scoring rubric. 

The prior knowledge portion of the chemistry example consists of 20 chemistry terms for 
which students "write down what comes to mind drawing upon [their] knowledge of 
chemistry." The "written materials ' consist of a description of how a chemistry teacher tests 
Samples of soda pop to determine which contained sugar and which contained an artificial 
sweetener. The writing task involves assisting a student who has been absent to prepare for 
an exam. 

Scoring is done on a scale of 0-5 for each of: overall impression, prior knowledge, number of 
principles or concepts cited, quality of argumentation, amount of text-based detail, and 
number of misconceptions. (The scoring scheme is elaborated upon for the history example, 
but not for the chemistry example.) Scoring on several of the five-point scales is based on the 
number of instances of a response rather than their quality. For example, conceptual 
misunderstanding is scored by counting the number of misunderstandings. Only the 
"argumentation" scale calls for a strictly quality judgment. 
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No technical information is included. Sample student responses are provided for the history 
example but not the chemistry example. 



(TC# 000.3CREPEA) 



Barnes, Lehman W., and Marianne B. Barnes. Assessment, Practically Speaking. Located 
in: Science and Children , March 1991, pp. 14- 15. 

The authors describe the rationale for performance assessment in science. Traditional tests 
(vocabulary, labeling, matching, multiple-choice, short-answer, puzzle, questions, essay) 
acci ,r ately assess student mastery of the verbal aspects of science, but they do not allow 
students to demonstrate what they know. 



(TC#600.6ASSPRS) 



Baron, Joan B. Performance Assessment: Blurring the Edges Among Assessment, 
Curriculum, and Instruction, 1990. Located in: Champagne, Lovitts and Calinger 
(Eds.), Assessment in the Service of Instruction , 1990, pp. 127-148. Available from: 
American Association for the Advancement of Science, 1333 H St. NW, Washington, 
DC 20005, |AAAS Books: (301) 645-5643|. Also in: G. Kulm & S. Malcom (Eds.), 
Science Assessment in the Service of Reform , 1991, pp. 247-266, A A AS. 

After a brief discussion of the rationale for doing performance assessments in science, this 
article describes work being done in Connecticut as of 1991 . The tasks for these assessments 
have three parts that involve a blend of individual work at the beginning and end, and group 
work in the middle: 

1 At the beginning, each student provides information about his or her prior knowledge and 
understandings of the scientific concepts and processes relevant to the task. The student 
also provides a preliminary solution to the task. This serves to encourage preliminary 
thinking, brings diversity to the thinking of the group, makes more obvious what each 
student brings to the task, has instructional value, and provides a baseline for students to 
refer to later. 

2 Students then work as a team to produce a group product. Throughout this process 
individual students report their views/summaries/insights of the work of the group. 

3 After the group work, a transfer task is completed individually by each student. 

The paper then spends some time discussing h< »r\ io structure the tasks used in such 
assessments, and also the learning theory and collaborative learning research that underpin the 
approach. 

The paper concludes with a discussion of current issues in performance assessment in science 
including 
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• They take a lot of time. 

• The concepts assessed are harder to teach and harder for students to grasp 

• Teachers are concerned about covering the material that is required in course guides 

• It requires a great deal of expertise on the part of the teacher. 
(TC#60(L6PERASB) 

Bennett, Dorothy. Assessment & Technology Videotape, 1991 Available from: The Outer 
for Technology in Education, Bank Street College of Education, 610 W. 1 12th St., New 
York, NY 10025, (212) 875-4550. 

The Center for Technology in Education (CTE) has been conducting research on how best to 
use technology in assessment. It supports the use of video to capture aspects of students' 
performance that cannot be assessed with paper and pencil. 

This document consists of a video and handbook that focus on the assessment of thinking 
skills, communication skills and interpersonal skills. Its context is a group project which 
requires applying physics to the design of motorized devices, and producing at least two 
Simula leous motions in different directions to accomplish an action or set of actions 

The first part of the video describes an alternative assessment system that uses students' 
personal journals, group logs, projects, and presentations. Personal journals document 
students' personal experiences with technology outside the classroom and their obsequious 
about how things work. Group logs document group problem-solving and dynamics The 
group projects and presentations are the major part of the assessment. Presentations are 
videotaped and scored by a panel of experts and other students 

The second part of the video contains four examples of students' presentations (car wash, 
tank, garbage truck, oscillating fan) which can be used to practice scoring using the criteria set 
forth in the handbook. Performances are scored using generalized criteria for thinking skills, 
communication/presentation skills, and work management/interpersonal skills, by looking at 
the relative numbers of positive and negative instances of each behavior 

Brief descriptions of the above criteria are contained in the handbook The procedure is a 
prototype. Feedback by those attempting to use the criteria is requested 

(TC#600.3ASSTEVh and 600.3ASSTEVv) 
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California Assessment Program (CAP). New Directions in CaP Science Assessment, 1990. 
Available from: California Department of Education. PO Box 944272, Sacramento, 
( A 94244, (916) 445-1260, fax (916) 657-5101. 

The new California science curriculum identifies six themes (energy, evolution, patterns of 
change, scale/structure, stability and systems/interactions) that cut across three content areas 
(earth sciences, physical sciences, and life sciences.) CAP is developing multiple-choice, 
open-ended, and performance items to match this curriculum. 

CAP administered open-ended science questions to 8,000 sixth graders in- the spring of 1989. 
The questions required students to create hypotheses, design investigations, and write about 
social and ethical issues in science. Each task took 10 to 15 minutes. 

CAP also field tested five performance assessment tasks with about 50,000 sixth graders in 
spring, 1900 Tasks were administered at five stations and took about 10 minutes each. The 
tasks were. 

1 Building a circuit, and then predicting, testing, and recording the conductivity of various 
materials 

2 Creating a classification system for a collection of leaves, and explaining the adjustments 
necessary when a "mystery leaf 1 is introduced into the group 

3. Performing a number of tests on a collection of rocks, and then recording and classifying 
the results 

4. Estimating and measuring water volumes 

5. Performing chemical tests on samples of lake water 

This document includes instructions for administering one of the performance tasks 
(electricity), seven letters written by students commenting on the assessment, and two open- 
ended questions with sample student responses. 

(TO600.3NEWMI) 



California Department of Education. Science—New Directions in Assessment, California 
Learning Assessment System, 1993. Available from: California Department of 
Education, PO Box 944272, Sacramento, CA 94244, (916) 657-2451, fax: (916) 657-5101 

This document contains the following: five performance tasks (1991 pilot performance tasks 
for grades 8, and 1 1, and 1902 performance tasks for grades 5, 8, and 10); a newsletter 
describing current status of the science portfolio for grades 5, 8, and 10; and an overview of 
the California Learning Assessment System (CLAS) for 1990-1992. (The CLAS also has an 
enhanced multiple-choice section that is described, but not illustrated, in this document,) 
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The tasks require some individual and some group work, have multiple questions focusing on 
a common theme (e.g., recycling or fossils), and require several class periods to complete. 
Scoring on the tasks in grades 5 and 8 is task specific (on a scale of 0-4 or 0-6); scoring on 
the grade 10 and 1 i tasks uses a general 1-4 point scoring guide that emphasizes 
understanding, detailed observations, good quality data, good experimental design, organized 
presentation of data, supported conclusions, and reasonable explanations. 

Neither technical information nor sample student responses are included. 

(TC#600.3SCINED) 



California Department of Education. Science Portfolios~~The Watershed, October 1992. 
Available from: California Assessment Program, California Dept. of Education, 
PO Box 944272, Sacramento, CA 94244, (916) 657-3747, fax: (916) 657-5101. 

This copy of the newsletter, The Watershed contains an article on writing in science, ideas on 
science portfolios, and a nice statement of assessment as an instructional tool. 

(TC#600.6SCIPOW) 

California Department of Education. Golden State Examination Biology and Chemistry, 
Draft, 1992. Available from: California Department of Education, PO Box 944272, 
Sacramento, CA 94244, (916) 657-2451, fax: (916) 657-5101 

The purpose of the Golden State Examination is to identify and recognize students with 
outstanding achievement in biology, algebra, geometry, US history, chemistry, and economics. 
This document describes the 1993 assessments. 

There are two required sections taking 45 minutes each. The first section is multiple-choice, 
justified multiple-choice, and short answer. The second section is a laboratory task, 
performed individually by using materials that have been set-up at testing stations. The 
purpose is to assess student ability to use laboratory equipment, make observations, conduct 
experiments, interpret results, and analyze data. (A third, optional, section is the science 
portfolio, described in another entry on this bibliography, TCU 600.3 GOLSTE.) 

Results are scored using a generic 1 -6 point scale tailored to specific tasks. The generic 
scoring guide emphasizes knowledge of biological concepts, creative use of principles, 
relevant alternative explanations, sound analysis of data, and clear communication. 

Sample tasks and student responses are included. No technical information is included. 
(TC#600.3GOLSTB) 
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California Department of Education. Golden State Examination Science Portfolio — A 
Guide for Teachers and A Guide for Students, 1994. Available from: California 
Department of Education, PO Box 944272, Sacramento, CA 94244, (916) 657-2451, fax: 
(916)657-5101 

The Golden State Examination (GSE) portfolio is a collection of student work produced 
dur ing a year of high school biology, chemistry, or second-year coordinated science. It allows 
students to present for evaluation a broader representation of performance, exhibiting depth of 
conceptual and procedural knowledge. This is an optional component for the GSE. Scores 
are combined with the multiple-choice, short-answer, open-ended, and laboratory- 
performance portions of the GSE only if it would serve to improve the student's overall score. 

Students must include three entries in the portfolio to demonstrate: problem-solving, creative 
expression, and growth through writing. Each entry is accompanied by a self-reflection sheet. 
There are both teacher and student guides which explain in detail what types of things should 
go in the portfolio and criteria for success. Examples of student work are included. No 
technical information is provided. 

(TC#600.3GOLSTE) 

Center for Talent Development. Elementary School Pre-Post Survey and Middle/High 
School Pre-Post Survey i 1993. Available from: Evaluation Coordinator, Center for 
Talent Development, Northwestern University, Andersen Hall, 2003 Sheridan Rd., 
Evanston, IL 60208, (708) 491-4979. 

This document contains surveys of student attitudes toward mathematics and science. There 
are two levels-elementary and middle/high school. It was designed for use with Access 2000 
participants who are primarily African- American and Hispanic students in an inner-city public 
school system and enrolled in a math/science/engineering enrichment program. 

(TO 220.3QUEELM) 

Champagne, Audrey, B. Lovitts, and B. Calinger. Assessment in the Service of Instruction, 
1990. Available from: American Association for the Advancement of Science, 
1333 II St. NW, Washington, DC 20005 |AAAS Books: (301) 645 56431. 

This book is a compilation of eleven papers that address the issue of making assessment a tool 
for meaningful reform of school science. The book contains papers that cover: an overview 
of good assessment, national and state assessment initiatives, traditional assessments, 
innovative assessments (performance, group, portfolio, and dynamic), and experiences in 
England and Wales. 

The introductory article by two of the editors (Assessment and Instruction: Two Sides of the 
Same Coin) covers the following topics: 
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1 . Reasons for assessing, including instruction, conveying expectations, monitoring 
achievement, accountability and program improvement. 

2. What should be assessed, and the inability of multiple-choice tests to assess the most 
important aspects of scientific competence: generating and testing hypotheses, designing 
and conducting experiments, solving multi-step problems, recording observations, 
structuring arguments, and communicating results; or scientific attitudes: comfort with 
ambiguity and acceptance of the tentative nature of science. 

3. A definition of "authentic" assessment: "An assessment is authentic only if it asks students 
to demonstrate knowledge and skills characteristic of a practicing scientist or of the 
scientifically literate citizen." Simply matching the curriculum is not enough, because the 
curriculum may be lacking. 

Other articles from this book that are particularly relevant to this bibliography are described 
separately. 

(TC#600.6ASSINT) 



Chi, M.T., P.J. Feltovich, and R. Glaser. Categorization and Representation of Physics 
Problems by Experts and Novices. Located in: Cognitive Science 5, 1981, pp. 121-152. 

The authors report on a series of studies to determine the differences between expert and 
novice problem solvers in physics. Although this paper is not about assessment per se, the 
observations in the paper might help users to define what good physics problem solving looks 
like, which in turn can serve as the basis for forming performance criteria to be used with 
performance assessments. 

Expert problem solvers begin with a brief analysis of the problem statement to categorize the 
problem (i.e., determine which schema to activate). Once activated, the schema itself specifies 
further tests for its appropriateness. When the expert has decided that a particular principle is, 
indeed, appropriate, then the knowledge contained in the schema provides the general form 
that specific equations, to be used for solution, will take. This is contrasted to novice problem 
solvers which use superficial characteristics to categorize problems and lack procedural 
connections. 

Several samples of expert and novice thinking are provided. 
(TC#660.6CATREP) 
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Coalition of Essential Schools. Various articles on exhibitions of mastery and setting 
standards, 1982-1992. Available from: Coalition of Essential Schools, Brown 
University, Box 1969, One Davol Sq., Providence, RI 02912, (401) 863-3384. 

Although not strictly about science, this series of articles discusses performance assessment 
topics and goals for students that are of relevance to science. The articles are: Rethinking 
Standards : Performances and Exhibitions: The Demonstration of Mastery : Exhibitions: 
Facin g Outward. Pointin g I nward : Steps in Planning Backwards : Anatomy of an Exhibition : 
and The Process o f Planning Backwards 

These articles touch on the following topics* good assessment tasks to give students, the need 
for good performance criteria, the need to have clear targets for students that are then 
translated into instruction and assessment, definition and examples of performance 
assessments, brief descriptions of some cross-disciplinary tasks, the value in planning 
performance assessments, and the notion of planning backwards (creating a vision for a high 
school graduate, taking stock of current efforts to fulfill this vision, and then planning 
backward throughout K-12 to make sure that we are getting students ready from the start). 

<TC#I50.6VARARD) 

Colison, J. Connecticut's Common Core of Learning, 1990. Available from: Performance 
Assessment Project, Connecticut Department of Education, Box 2219, Hartford, 
( T 06145,(203) 566-4001. 

The Connecticut Department of Education is developing a series of performance assessments 
in science and math. Each task has three parts: individual work to activate previous 
knowledge; group work to plan and carry out the task; and individual work to check for 
application of learning. This document provides: 

I A lengthy description of one of the ninth-grade science tasks: "speeders" 

2. Short descriptions of 24 performance tasks in science (8 each in chemistry, physics, and 
earth sciences), and 18 in math 

?> A group discussion self-evaluation form to be used by students 

No technical information or general scoring guides are included in this document. 

(TC#600.3CONSCI) 
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Collins, Angelo. Portfolios for Assessing Student Learning in Science: A New Name for a 
Familiar Idea?, 1990. Located in: Champagne, Lovitts and Calinger (Eds-). 
Assessment in the Service of Instruction , 1990, pp. 157-166. Available from: American 
Association for the Advancement of Science, 1333 H St NW, Washington, DC 20005 
[AAAS Books: (301)645-5643]. Also in: G. Kulm and S. Malcom (Eds.), Science 
Assessment in the Service of Reform , 1991, pp. 291-300, AAAS. 

This paper presents the rationale for using portfolios in science, defines and provides the 
characteristics of such portfolios, and discusses what should go in them. There is no one 
"right" way to do a portfolio. They will differ due to three factors-purpose, context and 
design. 

Purpose includes what is to be shown with the portfolio-mastery of content? understanding 
of and use of the processes by which this knowledge is constructed? student attitudes toward 
science? student comfort with ambiguity and the tentative nature of science? Purpose also 
includes how the portfolio will be used-student self-reflection? accountability? instruction 9 

Context includes such things as the age of the students and student interests and needs. 
Design covers such considerations as what will count as evidence, how much evidence is 
needed, how the evidence will be organized, who will decide what evidence to include, and 
evaluation criteria. 

This article focuses mostly on considerations when designing a portfolio system in science, but 
a few, brief examples are given. 

(TC#600.6PORFOA) 



Collins, Angelo. Portfolios: Questions for Design. Located in: Science Scope 15, 
March 1992, pp. 25-27. 

This repeats a lot of the information on this topic presented by the author on other entries on 
this bibliography. This is a nice, short summary. The author appears to use the term 
"purpose" (as in "determine the purpose for the portfolio") to mean "target" (what do we 
want to show about the student). 

(TC#600.3PORQUD) 



Collins, Angelo. Portfolios for Science Education: Issues in Purpose, Structure, and 
Authenticity. Located in Science Education 76, 1992, pp. 451-463. 

The author teaches preservice science teachers. This paper discusses design considerations 
for portfolios in science and applies these considerations to portfolios for student science 
teachers, practicing science teachers, and elementary students. The design considerations the 
author suggests are: 



• NWRKL, August 1 994 

Test Center- (503) 275-9582 



1 



14 



Science 



1 . Determine what the portfolio should be evidence of, i.e., what will the portfolio be used to 
show 9 

2. Determine what types of displays should go in the portfolio to provide evidence of #1. The 
author suggests and describes several types: artifacts (actual work produced), 
reproductions of events (e.g., photos, videotapes), attestations (documents about the work 
of the person prepared by someone else), and productions (documents prepared especially 
for the portfolio such as self-reflections). 

.v View the portfolio as a "collection of evidence" that is used to build the case for what is to 
be shown. Those developing the portfolio should determine the story to be told (based on 
all the evidence available) and then lay this out in the portfolio so that it is clear that the 
story told is the correct one. 

(T( #600.6PORSCE) 

Collins, Allan, Jan Hawkins, and John R. Frederiksen. Three Different Views of Students: 
The Role of Technology in Assessing Student Performance, Technical Report No. 12, 
April 1991. Available from: Center for Technology in Education, Bank Street College 
of Education, 610 W. 112th St., New York, NY 10025, (212) 875-4550, 
lax: (212) 316-7026. Also available from ERIC: ED 337 150. 

This paper begins by discussing why assessment in science needs to change: if tests continue 
to emphasize facts and limited applications of facts the curriculum will be narrowed to these 
goals. The paper then gives several good examples of how high-stakes uses of tests have 
negative, unintended side effects on curriculum and instruction. 

The authors use the term "systemically valid" to refer to assessments that are designed to 
foster (create) the learning they also assess. The authors discuss four criteria for "systemically 
valid" tests, i.e., the test directly measures the attribute of interest, all relevant attributes are 
assessed, there is high reliability, and those being assessed understand the criteria. They also 
discuss criteria for quality tasks, examples of alternative assessment ideas, cost, cheating, and 
privacy. 

(TC#600.6THRD1V) 



Connecticut State Department of Education. Connecticut Common Core of Learning 
Assessment 1989-1992. Available from: Connecticut State Department of Education, 
Division of Research, Evaluation, and Assessment, 165 Capitol Ave., Room 340, 
Hartford, CT 06106, (203) 566-4001. 

This package contains a variety of documents produced between 1989 and 1992. Included is 
information about , rationale for the assessment, Connecticut's Common Core of Learning 
(student learning objectives), development process, several sample tasks and scoring 
mechanisms, student and teacher feedback forms, summaries of student and teacher feedback 
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on the assessments using these forms, a process for developing performance tasks, a survey 
for student attitudes about science and mathematics, and an example of concept mapping as 
an assessment tool. 

There appear to be two kinds of tasks: complex group projects and shorter on-demand tasks 
covering individual skills. The projects attempt to get at application and extension of 
knowledge and concepts. They require some individual and some group work and extend 
over several days. The on-demand portion covers knowledge and its application in limited 
situations. Performances are scored for group collaboration, process skills, and 
communication skills. Some of the rubrics are task specific and some are general; some are 
based on quantity (the number of possible solutions listed, for example) and some are more 
quality based. 

(T0000.3CONCOC) 



CTB McGraw-Hill. CAT/5 Performance Assessment Supplement, 1990. Available from: 
CTB/McGraw-Hill, PO Box 150, Monterey, CA 93942, (800) 538-9547, 
fax (800) 282-0266. 

The "CTB Performance Assessments" are designed to either be stand-alone or integrated with 
the CAT/5 or CTBS/4. There are five levels for grades 2-11. The total battery includes 
reading/language arts, mathematics, science, and social studies. There are 12-25 short- 
response questions for each subtest. The math and science subtests take 30-40 minutes. The 
entire battery takes two to three hours. (For the CAT/5 there is a checklist of skills that can 
be used at grades K and 1 .) 

Some questions are grouped around a common theme. Many resemble multiple-choice 
questions with the choices taken off. For example, questions on one level include: kt What are 
two ways that recycling paper products helps the environment?' 1 "This table shows the air 
temperatures recorded every two hours from noon to midnight... At what time did the 
temperature shown on the thermometer most likely occur?" and "These pictures show some 
of the instalments that are used in science... List two physical properties of the water in the jar 
below that can be measured with the instruments shown in the pictures Next to each 
property, write the name of the instrument or instruments used to measure the property ." 

Some of the answers are scored right/wrong and some are scored holistically. The materials 
we received contained no examples of the holistic scoring so we are unable to describe it. 
Scoring can be done either locally or by the publisher. When the Performance Assessments 
are given with the CAT/5 or CTBS/4, results can be integrated to provide normative 
• information and scores in six areas. There are only three, however that use the math and 
science subtests: demonstrating content and concept knowledge, demonstrating knowledge of 
processes/skills/procedures, and using applications/problem solving strategies. When the 
Performance Assessments are given by themselves only skill scores are available. 
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The materials we received contain sample administration and test booklets only. No technical 
information or scoring guides are included. 

(TC# 060.3CAT5PA) 



Curriculum Corporation. Science— A Curriculum Profile for Australian Schools and Using 
the Science Profile, 1994. Available from: Curriculum Corporation, St. Nicholas PI., 
141 Rathdowne St M Carlton, Victoria, 3053, Australia, (03) 639-0699, fax (03) 639-1616. 

These documents represent the science portion of a series of publications designed to 
reconfigure instruction and assessment in Australian schools. The project, begun in 1989, was 
a joint effort by the States. Territories; and the Commonv/ealth of Australia, initiated by the 
Australian Education Council. 

The profiles are not performance assessments, per se, in which students are given 
predeveloped tasks. Rather, the emphasis has been on conceptualizing major student 
outcomes in each area and articulating student development toward these goals using a series 
of developmental continuums These continuums are then used to track progress and are 
overlaid on whatever tasks and work individual teachers give to students. 

The science profiles cover the strands of earth and beyond, energy and change, life and 
living, natural and processes materials, and working scientifically. Each strand is divided into 
subareas called "organizers. " For example, the organizers for the strand of 'working 
scientifically" are: planning investigations, conducting investigations, processing data, 
evaluating findings, using science, and acting responsibly. Each organizer is tracked through 
eight levels of development. For example, the organizer of "processing data" has "talks about 
observations and suggests possible interpretations" at Level 1, and "demonstrates rigour in 
handling of data" at Level 8. 

There are lots of support materials that describe what each strand means, how to organize 
instruction, types of activities to use with students, and how to use the profiles to track 
progress. Some samples of student work are included to illustrate development. The 
documents say that the levels have been "validated," but this information is not included in the 
materials we received. 



(TC# 600.3SCICUA) 



Curriculum Corporation. Technology— A Curriculum Profile for Australian Schools^ 1994. 
Available from: Curriculum Corporation, St. Nicholas PI., 141 Rathdowne St., Carlton, 
Victoria, 3053, Australia, (03) 639-0699, fax (03) 639-1616. 

This document represents the technology portion of a series of publications designed to 
reconfigure instruction and assessment in Australian schools. The project, begun in 1989, was 
a joint effort by the States, Territories, and the Commonwealth of Australia, initiated by the 
Australian Education Council. 
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The profiles are not performance assessments, per se, in which students are given 
predeveloped tasks. Rather, the emphasis has been on conceptualizing major student 
outcomes in each area and articulating student development toward these goals using a series 
of developmental continuums. These continuums are then used to track progress and are 
overlaid on whatever tasks and work individual teachers give to students. 

The technology profiles cover the major strands of: designing, making and appraising, 
information, materials, and systems. Each strand is broken down into subareas called 
"organizers." For example, the organizers for "designing, making and appraising" are 
investigating, devising, producing, and evaluating. Each organizer is tracked through eight 
levels of development. For example, "evaluating" goes from "describes feelings about own 
design ideas, products, and processes" at Level I to "analyzes own products and processes to 
evaluate the effectiveness of methodologies used and the short and longer-term impact on 
particular environments and cultures" at Level 8. 

There are lots of support materials that describe what each strand means, how to organize 
instruction, types of activities to use with students, and how to use the profiles to track 
progress. Samples of student work are included to illustrate development. The document 
says that the levels have been "validated," but this information is not included in the materials 
we received. 

(TC# 600.3TECCUA) 

Dana, Thomas M M Anthony W. Lorsbach, Karl Hook, and Carol Briscoe. Students 
Showing What They Know: A Look at Alternative Assessments \ 1991. Located in: G. 
Kulm and S. Malcolm (Eds.), Science Assessment in the Service of Reform , 1991, 
pp, 331-337. Available from: American Association for the Advancement of Science, 
1333 H St. NW, Washington, DC 20005 [AAAS Books: (301) 645-5643 j. 

The authors present short descriptions of assessment activities they have developed and used 
with students at the Florida State University School for grades 6-12 in physical science, 
biology, and chemistry. The assessments are based on the theory that students construct 
knowledge for themselves as they participate in educational activities. The authors briefly 
mention the following techniques: concept mapping, journals, scrap books, and oral 
interviews. The examples include mostly descriptions of tasks; there is mention, but not 
elaboration, of the criteria forjudging responses. The techniques emphasize student self- 
evaluation. 

(TC#600.3STUSHW) 
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Darling-Hammond, Linda, Lynne Einbender, Frederick Frelow, and Janine Ley-King. 
Authentic Assessment in Practice: A Collection of Portfolios, Performance Tasks, 
Exhibitions, and Documentation, October 1993. Available frvm: National Center for 
Restructuring Education, Schools, and Teaching (NCREST), Box 110 Teachers College, 
Columbia University, New York, NY 10027. 

This book contains sample performance assessments for grades 1-12 in science, math, social 
studies, writing and drama from a number of sources. Formats include exhibitions, projects, 
on demand performance assessments and portfolios. The authors have included reprints of 
papers that discuss characteristics of "authentic" assessment, performance task design, and 
portfolios. Not all assessment information for each example is reproduced; the authors have 
usually excerpted or summarized information. Performance tasks are more thoroughly 
covered than performance criteria. In most cases no technical information or sample student 
responses are provided. 

There are five science samples that cover grades 5-12. Tasks include designing a carton, 
explaining springs, floatation, insulation, and explaining the motion of a maple seed. 

(TC# 000.3AUTASP) 



Doran, Rodney, Joan Boorman, Fred Chan, et al. Assessment of Laboratory Skills in High 
School Science* 1991. Available from: Graduate School of Education, University of 
New York at Buffalo, Buffalo, NY 14260, (716) 645-2455. 

This document consists of four manuals (Science Laboratory Test in: Biology, General 
Science, ( 'hemistry, and Physics), and two overview presentations (Alternative Assessment of 
High School Laboratory Skills and Assessment of Laboratoty Skills in High School Science). 
These describe a series of on-demand activities to assess high school student laboratory skills 
in science, and a study examining test reliability, inter-rater agreement, and correlations 
between different parts of the tests. 

Six hands-on tasks are presented in each content area manual (biology, chemistry, physics), 
llach task has two parts. In Part A, students are given a problem to solve and are directed to 
state an appropriate hypothesis, develop a procedure for gathering relevant observations or 
data and propose a method for organizing the information collected. After 30 minutes their 
plans are collected. Plans are scored on three experimental design traits: statement of 
hypothesis, procedure for investigation, and plan for recording and organizing 
observations/data. In Part B students are given a predeveloped plan to collect information on 
the same questions as in Part A. They have 50 minutes to carry out the plan and compose a 
written conclusion. Performance on Part B is scored for quality of the observations/data, 
graph, calculations, and conclusion. This procedure ensures that success on Part B is not 
dependent on Part A. Scoring is designed to be generic: the same criteria are used across 
tasks Individual tasks also have specific additional criteria. 
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The General Science test has six tasks set up in stations. Students spend ten minutes at each 
station. Students answer specific questions that are classified as planning, performing, or 
reasoning. Scoring is not generalized; points are awarded for specific answers. 

All manuals include complete instructions for administering and scoring the tests. Only a few 
sample student responses are provided. Results from a study done with 32 high schools in 
Ohio showed that rater agreement was good, it was a very time-consuming process, and 
teacher reactions varied widely. 

(TC#600-3ASSLAS) 

Gayford, Christopher* Group problem solving in biology and the environment, 1989-93. 
Available from: Department of Science and Technology, University of Reading, 
Reading, Berkshire RG6 1HY, England, UK, (073) 431-8867. 

This document consists of three journal articles: A Contribution to a Methodology for 
Teaching and Assessment of Group Problem Solving in Biology Among 15-Year Old Pupils 
Located in Journal of Biological Education 23, 1989, pp. 193-198. Patterns of Group 
Behavior in Open-Ended Problem Solving in Science Glasses of 1 5-Y ear-Old Students in 
England. Located in: I nternational Journal of S cie nce Education 14, 1992, pp. 41-49. 
Discussion-Based Group Work Related to Environmental Issues in Science Glasses with 15- 
Year-Old Pupils in England. Located in: International Journal of Science Education 15, 
1993, pp. 521-529. 

The author reports on a series of related studies in which secondary students engaging in 
group work are assessed on a variety of skills such as group process, problem solving, 
attitudes, and science process. The purposes of the studies were to: ( 1 ) explore the use of 
group discussion as a way to develop and exercise skills such as communication, problem 
solving, and numeracy; (2) discover how students approach problem solving tasks; and (3) 
describe the group dynamics of students engaging in group problem solving tasks. The papers 
are included in this database because of the assessment devices developed by the author to 
examine student problem solving and process skills. 

The specific tasks in which students were engaged in these studies were discussions of 
controversial issues about the environment and practical investigations in which students were 
to determine the best source of a substance or the amount of water needed by various plants. 
Students worked in groups. Each task took from 60-90 minutes. Performance was assessed 
using a variety of scoring guides, the most detailed of which was a generalized rubric 
assessing ability to state the problem, ability to work cooperatively as a team, quality of 
reasons for choice of design, ability to modify the design as a result of experience, and ability 
to evaluate success. Performance was rated on a three-point scale. 

The papers include a good enough description of the tasks and scoring procedures that they 
could be reproduced by the reader. The paper also includes information about student 
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performance on the tasks. No other technical information nor sample student responses are 
included 

Pen ission to reproduce materials has been granted by the author. 
(TC# 600.3CONTOM) 

reen, Barbara. Developing Performance-Based Assessments and Scoring Rubrics for 
Science. Available from: Texas Education Agency, Instructional Outcomes 
Assessment, 1701 N. Congress Ave., Austin, TX 78701, (512) 463-9734. 

Paper presented at Texas Science Supervisors Association Cast Conference, Austin, TX, 
November 3, 199.1. 

The Texas Education Agency is field-testing performance tasks to assess grade 4 and 8 
science process skills. Actual tasks are secure; sample tasks not used in the assessment are 
available The two we received require students to design an insulating container for ice 
cubes (grade 4) and determine the absorbency of paper towels (grade 8). These illustrate the 
two basic kinus of tasks-design and inquiry. Students plan and carry out their designs or 
inquiries at stations having a standard set of disposable and nondisposable materials. Students 
respond in writing (showing pictures, diagrams, and data displays when appropriate) to 
printed directions. For example, the grade 4 task asks students to plan the design (draw a 
picture and write a description), construct the design and test it, improve the design, and write 
a report (written analysis and conclusion). 

Scoring uses a different holistic four-point scale for each of the two types of tasks: designs 
and investigations. For example, a "4 %> on design tasks means. 

The overall response is consistent with a sound scientific approach to design. The response indicates that 
the student has a clear understanding of the problem. The response may, in some cases, define additional 
aspects of the problem or include extensions beyond the requires of the task. Some inconsistencies may be 
present, but they arc overwhelmed by the superior quality of the response. A score point '4' response is 
characterized by most of the following... 

The package of materials we received has descriptions of the two tasks, a sample student 
response for each (unscored), and the scoring guide for each. No technical information is 
included. The contact person has given permission for educators to reproduce, for their own 
students, the materials submitted. 

(TC# 600.3PERBAA) 
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Ha!!, Greg. Performance Assessment in Science - STS Connections, 1993. Available from: 
Alberta Education, Box 43, 11160-Jasper Ave M Edmonton, AB TSK 0L2, Canada. 
(403) 427-0010, fax (403) 422-4200, 

The "Grade 9 Science Performance-Based Assessment" consists of six stations set up in a 
circuit at which students perform a variety of investigations. The six in the 1003 assessment 
include: seed dispersal, calibrating a hydrometer and using it to measure the density of a suuai 
solution, determining which of several choices is the best insulator, building a robot arm. 
testing for contaminates, and examining an environmental issue. Three circuits, 
accommodating a total of 15 students, is recommended. Each group requires two hours 
Students respond in writing to a series of questions. 

Responses for the Grade 9 assessment were scored on two dimensions problem 
solving/inquiry skills and communication. The scoring guide is generalized (the same one is 
used across all tasks) and uses a four-point (0-3) scale. A "3" for Inquiry is ''Analyzed and 
readily understood the task, developed an efficient and vorkable strategy, strategy 
implemented effectively, strategy supports a qualified solution, and appropriate application of 
critical knowledge." A "3" for Communication is: "Appropriate, organized, and effective 
system for display of information or data, display of information or data is precise, accurate, 
and complete; and interpretations and explanations logical and communicated effectively " 

The documents we have contain: a general overview of the procedures, complete activity 
descriptions, an administration script and the scoring guide. Student booklets for the Oth 
grade assessment, technical information and sample student responses are not included 

(TC# 600.3PERAST) 

Halpern, Diane (Ed.). Enhancing Thinking Skills in the Sciences and Mathematics, 1992. 
Available from: Lawrence Erlbaum Associates, Publishers, 365 Broadway, Hillsdale, 
N J 07642, (800) 926-6579. 

This book is not strictly about assessment. Rather, it discusses the related topics of "What 
should we teach students to do?" and "How do we do it?" The seven authors "criticize the 
conventional approach to teaching science and math, which emphasizes the transmission of 
factual information and rote procedures applied to inappropriate problems, allows little 
opportunity for students to engage in scientific or mathematical thinking, and produces inert 
knowledge and thinking skills limited to a narrow range of academic problems." ( p 118) In 
general, they recommend that teachers focus on the knowledge structures that students should 
know, use real tasks, and set up instruction that requires active intellectual engagement 

The authors give various suggestions on how to bring this about , instructional methods, 
videodiscs, group work, and a host more. The final chapter analyzes the various positions and 
raises theoretical issues. 

(TC#500.6ENHTHS) 
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Hardy, Roy* Options for Scoring Performance Assessment Tasks. Available from: 

Educational Testing Service, 1979 Lakeside Parkway, Suite 400, Tucker, GA 30084 

A presentation to the National Council on Measurement in Education, San Francisco, 
California, April 23, 1992. 

Four assessment tasks were developed to explore the feasibility of performance assessment as 
part of a statewide assessment program. Tasks were: shades of color (grades 1-2), 
discovering shadows (grades 3-4), identifying minerals (grades 5-6), and designing a carton 
(grades 7-8). The tasks are described in the paper, but all of the relevant materials are not 
included Each task was designed to take one hour. Most tasks are completed individually, 
but one (cartons) is a group task. 

Response modes were varied (multiple-choice, figural, short narratives, products), in part to 
see which are feasible, and in part to see how different kinds of scores relate to each other. 
Most scoring was right/wrong or holistic on degree of "correctness" of answer. Cartons was 
scored holistically on problem solving. The scoring procedures are described but not 
presented in detail. The paper also describes the process used to develop scoring rubrics, to 
train scorers at the state level, and to analyze the data. No sample student responses are 
included in this document, but were used in training. 

The tasks were completed by 1, 128 students from 66 classes in 10 school districts. Teachers 
completed a survey (questions are included in the paper). Results showed that it took from 
1/2 to three minutes to score the performances, interrater agreement ranged from .76 to the 
high 90 ! s, relationships between scoring procedures varied, and teachers liked the procedures. 
In all, the author concluded that it is feasible to use performance tasks in statewide 
assessment. 

(TC#600.3OPTSCP) 



Harlen, Wynne. Performance Testing and Science Education in England and Wales^ 1991. 
Located in: Gerald Kulm and Shirley M. Malcom (Eds.). Science Assessment in the 
Service of Reform . Available from: American Association for the Advancement of 
Science, 1333 H St. NW, Washington, DC 20005 [AAAS Books: (301) 645-5643]. 

This is a good summary of the approach to science education and assessment currently under 
way in England and Wales. (For related information, see the entries under Chris Whetton.) It 
discusses the history of the project, provides three hands-on test questions as examples, and 
describes the issues and problems which have arisen thus far— for example, comparability of 
tasks, amount of reading required by students, and trying to accomplish too many purposes 
with a single assessment. 

From the examples provided, it appears that the performance tasks are a series of open- 
response questions which address a single science process skill, e.g., interpreting information, 
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planning an investigation, or observing. Students provide short-answers which are evaluated 
according to degrees of con tfeness or right/wrong. Criteria differ by task. 



(TC#600.6PERTES) 



Heard, Virgil Gale, A Comparison of Science I and Science II with Traditional Curriculum, 
July 25, 1992. Available from: Program Director for Science, Fort Worth Independent 
School District, University Plaza, Suite SE216, 100 N. University Dr., Fort Worth, 
TX 76107,(817)871-2531. 

The state of Texas recently provided school districts with the option of replacing traditional 
subject area courses with thematic, coordinated courses that integrate the life sciences, earth 
sciences, and physical sciences. A prototype Science II Pre/Post Test was developed to 
compare effects on student learning of implementing this approach (Science I and II) to a 
more traditional subject area approach to teaching science It was administered to about 500 
eighth graders in four pilot and three control schools. 

The paper-and-pencil test consists of 40-50 multiple choice questions. However, students are 
required to manipulate equipment at 8-9 laboratory stations in order to determine the 
"correct" responses to 18 of these test questions. Test are scored electronically. 

The following are included in this document: ( 1 ) a report of the test results, and (2) a copy of 
the Pre/Post Test. No answer key is provided. 

(TC# 600.3SCIIIP) 

Helgeson, Stanley. Assessment of Science Teaching and Learning Outcomes, 1993* 

Available from: The National Center for Science Teaching and Learning, 104 Research 
Center, 1314 Kinnear Rd*, Columbus, OH 43212, (614) 292-3339. 

The author provides a nice summary of the following topics: purposes of student assessment 
in science, results of national and international assessments, the effects of high stakes 
assessment on instruction, assessment of attitudes in science, computer applications, and 
reasons for alternative assessment. 

(TC#600.6ASSSCT) 



Helgeson, Stanley L., and David D. Kumar. A Review of Educational Technology in Science 
Assessment, 1993. Available from: The National Center for Science Teaching and 
Learning, 1929 Kenny Rd., Columbus, OH 43210, (614) 292-3339. 

The authors found that most current use of technology consists mainly of computerized 
administration of multiple-choice tests drawn from item banks. However, they also describe 
some more innovative applications such as computer generated test questions, laboratory 
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simulations, and computer scoring of more open-ended tasks. However, most of these appear 
to be based on assessing content and procedural knowledge rather than thinking skills. 

(TC# 600.6REVEDT) 

Hibbard, K. Michael, Region 15 - Together for Students - A Community of Learners, 1993. 
Available from: Region 15 School District, PO Box 395, Middlebury, CT 06762, 
(203) 758-8250. 

This document contains handouts used in a training session that appears to be an overview of 
the Pomperaug Regional School District 15 student assessment system. In addition to a 
general overview and philosophy statement, the hando' ts include sample assessment materials 
in science, social studies, math, and writing for grades 1-12. 

The science information focuses mostly on checklists for assessing writing in science. 
No technical information nor samples of student work are included. 
(TC# 000.6TOGSTC) 

Hibbard, K. Michael. What's Happening?, 1991. Available from: Region 15 School 
District, PO Box 395, Middlebury, CT 06762, (203) 758-8250. 

This document is a series of performance tasks in which assessment is integrated with 
instruction. The tasks include: chemical reaction, consumer action research, plant growth, 
physiological responses of the human body, survival in the winter, science fiction movie 
development, and food webs. Each task includes assessment rating forms and checklists, 
some of which are designed for student self-assessment. For example, the survival in winter 
exercise includes a rating scale that assesses 12 features of the project on a scale of 1-5, and a 
rating scale for an oral presentation. Other tasks include performance criteria for group work 
and self-rating on perseverance. The performance criteria are a mixed bag. Some directly 
refer to specific features of the task (e.g., "detailed descriptions were given of each plants' 
growth"). Others are general features that could be applied to many tasks (e.g., "shows 
persistence"). However, there is no standard of criteria across tasks; there is a different 
number of criteria and a different mix of specific and general criteria depending on task. 

The assessments were developed for classroom use and do not include detailed definitions of 
traits to be rated, nor sample anchor performances. No technical information is included. 

(TC#600.3WHAHAP) 
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Horn, Kermit, and Marilyn Olson. 1992-1993 Lane County Fourth Annual Project Fair - 
Official Guidelines, Criteria & Registration Forms for Grades K-12. Available from: 
Kermit Horn or Marilyn Olson, Project Fair Coordinators, Instructional Services 
Division, Lane Education Service District, PO Box 2680, Eugene, OR 97402, 
(503) 689-6500. 

This document is the handbook given to students in grades K-12 interested in registering for 
the Lane County project fair. It contains information on registration, criteria by which 
projects will be judged, and help with getting started. 

The document also gives some excellent ideas on interdisciplinary projects. 

Some journal entries from past submissions are included to show students what to do. No 
samples are included that illustrate score points on criteria. The criteria, although an excellent 
start, are still a little sketchy. 

(TC# 000.3LANCOP) 

Johnson, David W. and Roger T. Johnson* Group Assessment as an Aid to Science 

Instruction, 1990. Located in: Champagne, Lovitts and Calinger (Eds*), Assessment in 
the Service of Instruction, 1990, pp. 267-282. Available from: American Association 
for the Advancement of Science, 1333 H St. NW, Washington, DC 20005 |AAAS Books: 
(301) 645-5643]. Also located in: G. Kulm & S. Malcom (Eds.), Science Assessment in 
the Service of Reform , 1991, pp. 103-126, AAAS. 

The authors favor cooperative learning in science because of research that shows positive 
effects on student learning and attitudes. Their suggestions for group assessment build on this 
same philosophy— group assessment involves having students complete a lesson, project, or 
test in small groups while a teacher measures their level of performance. If done well, this 
format allows assessment of outcomes that are difficult to assess in other ways, such as 
• reasoning processes, problem solving, metacognitive thinking, and group interactions. The 
authors also maintain that it increases the learning-it is designed to measure, promotes positive 
attitudes toward science, parallels instruction, and reinforces the value of cooperation. The 
article describes how to structure performance tasks in a cooperative framework. 

The authors then describe, in general, different ways to record the information from the task- 
observational records, interviews, individual and group tests, etc. This is a general overview 
of the possibilities, however, and provides no specific rubrics, forms, questions, etc. 



(TC#600.6GROASA) 
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Jones, Lee R M Ina V.S. Mullis, Senta A. Raizen, et al. The 1990 Science Report Card- 

NAEP's Assessment of Fourth, Eighth, and Twelfth Graders. Available from: Education 
Information Branch, Office of Educational Research and Improvement, US Dept. of 
Education, 555 New Jersey Ave NW, Washington, DC 20208, (800) 424-1616. 

The National Assessment of Educational Progress (NAEP) is a congressionally mandated 
project of the National Center for Education Statistics, part of the US Department of 
Education. This document includes the results of the 1990 Science Assessment from NAEP. 
It includes sample exercises, some of which are multiple-choice and others are open-ended. 

(TC# 600.6SCEREC2) 

Jones, ML Gail* Performance-based Assessment in Middle School Science. Located in: 
Middle School Journal 25, March 1994, pp. 35-38. 

The author presents some ideas on how to do performance-based classroom assessments of 
science process skills. She recommends reviewing units of study to analyze the process skills 
being emphasized, identifying the observable aspects of each skill, and designing tasks that 
allow you to observe the skills. The author illustrates this process in various ways— both by 
giving an extended example on coastal ecology, and by listing science process skills, some 
observable behaviors, and related examples of tasks students could be given to elicit the 
behavior. The discussion of how to score responses is not as complete. No technical 
information is provided. 

(TC#600.6PERBAS) 

Jungwirth, Ehud, and Amos Dreyfus. Analysis of Scientific Passages Test<> 1988. Available 
from: Educational Testing Service, Tests in Microfiche, TC922019, Set R. Also 
referenced in: "Science Teachers' Spontaneous, Latent or Non-attendance to the 
Validity of Conclusions in Reported Situations," Research in Science and Technolo2ical 
Education 8, 1990, pp. 103-115. 

The authors have developed a measure of science teacher critical thinking. Teachers are 
asked to identify the similarities between two passages in which the same invalid conclusion is 
reached. Passages illustrate "post hoc" thinking; attributing causality to an antecedent event; 
drawing conclusions about a population from a non-representative sample; and acceptance of 
tautologies as explanations. Teachers respond in writing. There are four forms. No technical 
information, except performance of one group of 76 teachers, is available. 

(TC#600.4ANASCP) 
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Kamen, Michael Use of Creative Drama to Evaluate Elementary School Students 1 

Understanding of Science Concepts, 199L Located in: G. Kulm and S. Malcom (Eds.), 
Science Assessment in the Service of Reform , 1991, pp. 338-34 1. Available from: 
American Association for the Advancement of Science, 1333 H St. NW, 
Washington, DC 20005 [AAAS Books: (301) 645-5643]. 

This article emphasizes kinesthetic learning—reinforcing and assessing knowledge of scientific 
concepts through acting them out. For example, students can demonstrate their knowledge of 
waves by forming a line and creating waves with different wave length and amplitude. Other 
examples are given for air pressure, solar energy, and land snails. The assessment appears to 
occur by seeing the extent to which students can illustrate the concept properly. No other 
performance criteria are discussed. Tasks were designed for students in grades K-6. 

(TC#600.6USECRD) 

Kanis, I. B. Ninth Grade Lab Skills. Located in: The Science Teacher , January 1991, 
pp. 29-33. 

This paper provides a summary description of the six performance tasks given to ninth grade 
students as part of the 1985-86 Second International Science Study to assess laboratory skills. 
A brief description, a picture of the lab layout, and a list of scoring dimensions are provided 
for each task. It appears that scoring is essentially right/wrong and task-specific. Students 
were scored on ability to manipulate material, collect information, and interpret results. A 
brief discussion of some results of the assessment are provided. There is enough information 
here to try out the tasks, but not enough to use the performance criteria. No sample student 
performances are included. The paper also discusses problems with many current lab 
activities (too cookbook) and how to redesign lab exercises to promote higher-order thinking 
skills. 

(TOtfo 5NINGRL) 

Kentucky Department of Education. Kentucky Instructional Results Information System, 
1991-92. Available from: Advanced Systems in Measurement & Evaluation, Inc., PO 
Box 1217, 171 Watson Rd., Dover, NH 03820, (603) 749-9102. Also available from: 
Kentucky Department of Education, Capitol Plaza Tower, 500 Mero St., Frankfurt, 
KY 40601, (502) 564-4394. 

This document contains the released sets of exercises and related scoring guides from 
Kentucky's 1991-92 grade 4, 8, and 12 open-response tests in reading, math, science, and 
social studies. It does not contain any support materials such as: rationale, history, technical 
information, etc. 

There are three to five tasks/exercises at each grade level in each subject. Most are open- 
response (only one right answer), but some are open-ended (more than one right answer). 
Examples in math are: write a woi J problem that requires certain computations, determine 
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how many cubes are needed for a given figure, follow instructions, explain an answer, arrange 
a room, explain a graph. Examples in science are: experimental design for spot remover, 
graph and interpret results of a study on siblings, and predict the weather from a weather map. 
Each exercise has it's own set of scoring criteria. It appears that the scoring emphasizes the 
correctness of the response and not the process by which the response was obtained. 

(TC#060.3KENINR) 



Kentucky Department of Education. Performance Events, 1992-93, Grade 8. Available 
from: Kentucky Department of Education, Capital Plaza Tower, 500 Mero St., 
Frankfort, KY 40601, (502) 564-4394. 

This document includes three performance tasks and related scoring guides from the 1993 
grade 8 assessment. The tasks relate to mapping the ocean floor, identifying bones, and water 
pollution. There is both group and individual work using a variety of manipulatives. Each 
task consists of a a series of related questions, some of which have only one right answer and 
some of which are more open-ended. Scoring employs task-specific scoring guides developed 
from a generic guide that addresses completion of the task, understanding, efficiency/ 
sophistication, and insight. Scored student responses are included. No technical information 
is included. 

(TC#600.3PEREVG) 



Koballa, T.R. Goals of Science Education, 1989. Located in: D. Holdzkom and P. Lutz 
(Eds.), Research Within Reach: Science Education , pp. 25-40. Available from: 
National Science Teachers Association, Special Publications Department, 
1742 Connecticut Ave. NW, Washington, DC 20009, (202) 328-5800. 

Assessment should be designed to cover important student processes and outcomes. This 
article is included because it discusses what our goals for students should be. Specifically, the 
author maintains that most science curricula are oriented toward those students that want to 
pursue science academically and professionally. We should also, however, be looking at 
science education as a means of promoting other important goals for students such as: longing 
to know and understand, respect for logic, and helping students to acquire capacities to cope 
with change. 



(TC#600.5GOASCE) 



Kober, Nancy. What We Know About Science Teaching and Learning, 1993. Available 
from: Council for Educational Development and Research (CEDaR), 
2000 L Street, NW, Suite 601, Washington DC 20036, (202) 223-1593. 

This booklet provides a very nice summary and overview of the changes in science instruction 
and assessment and the reasons for the changes. It includes short sections on such topics as: 
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why science is important for all citizens, why science instruction needs to change, instructional 
ideas, implications for policy, curriculum standards, how to send the message that science is 
important, equity issues, instructional methods, staff development needs of teachers, and the 
role of parents and the community. 

(TC#600.6WHAKNS) 



Kulm, Gerald, Shirley M. Malcom. Science Assessment in the Service of Reform, 1991. 
Available from: American Association for the Advancement of Science, 1333 H St. NW, 
Washington, DC 20005 [AAAS Books: (301) 645-5643]. 

This book contains articles from various authors who discuss: current issues surrounding 
science assessment, the rationale for considering alternatives, curriculum issues and trends, 
and alternative assessment initiatives in various states and countries. There are good 
summaries of what is occurring with the National Assessment of Educational Progress, with 
test publishers in England and Wales, and with various states. An appendix presents brief 
descriptions of alternative assessments under development by various organizations. The 
individual articles that appeared to be of most interest for the purpose of this bibliography are 
entered separately. 

(TC#600.6SCIASI) 



Lawrence, Barbara. Utah Core Curriculum Performance Assessment Program: Science, 
1993. Available from: Profiles Corporation, 507 Highland Ave., Iowa City, IA 52240. 

The Utah State Office of Education has developed 90 constructed response items in 
mathematics, science and social studies (five in each of grades 1-6 for each subject) to 
complement multiple-choice tests already in place. Assessments are designed to match the 
Utah Core Curriculum. Although districts must assess student status with respect to core- 
curriculum goals, use of the state-developed assessments is optional. 

The science assessments are designed to measure four general process skills: 
identify/describe, explain, infer, organize, and create. Each task has several questions relating 
to the same theme. For example, one grade 3 task takes students through a simulated walk in 
the woods. A series of activities asks students to do such things as: "Color an animal and its 
surroundings in a way that shows how the animal uses camouflage... and "Next to each 
animal paste the picture of an animal or animals likely to use that shelter " Most student 
responses are short (some are multiple-choice); the longest are no more than a paragraph. 

Scoring is task-specific and based either on getting the correct answer (e.g., the score for 
pasting animals next to shelters is 0-3 depending on how many are done correctly) or quality 
of the response (e.g., the score for camouflage is 0-2, where 2 is "student colors one of the 
animals in a way that enhances its camouflage" and 1 is "student partially addresses the 
task"). Points are totaled for each task, and between tasks for each of the four process skills 
assessed. Four levels of proficiency on each skill are identified: advanced, proficient, basic 
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and below basic. Cut scores for each level are based on percent correct (approximately 
90%=advanced, 70%=proficient, 40%=basic, below 40%=below basic) and behavioral 
descriptions of performance at each level. 

Assessment activities are bound in books for each grade level/subject. Each task includes 
teacher directions, student test-taking materials, and scoring guides. The Office of Education 
has collected information on teacher reaction to the assessments from the field test. No other 
technical information is available at this time. An introductory training video is available 
which helps teachers use the assessment program (but does not deal specifically with science.) 

(TC# 600.3UTACOC and 000.6INTUTCv--v/rfe<>) 



Lawrence Hall of Science. Full Option Science System— Water Module, 1992. Available 
from: Encyclopedia Britannica Educational Corporation, 310 S. Michigan Ave., 
Chicago, IL 60604, (800) 554-9862. Also available from: Lawrence Hall of Science, 
University of California, Berkeley, CA 94720, (510) 642-8941. 

The Full Option Science System is a series of hands-on instructional modules with associated 
assessments. The module reported here is on water. There are three parts to the assessment, 
all of which are described in detail in the document. The first part is a series of hands-on tasks 
set up in stations. Examples are: "Put three drops of mystery liquids on wax paper and 
observe what happens." and "What do your observations tell you about the mystery liquids?" 
Two different testing configurations are outlined (8 students and 24 students). Each group 
takes about 30 minutes. The second part of the assessment is an open-response paper and 
pencil test that takes about 1 5 minutes. The third part of the assessment is an application of 
concepts in paper and pencil format that takes about 20 minutes. All answers are scored for 
degree of correctness. 

Administration and scoring information is provided, but no technical information on the tests 
nor information about typical performance is given. 

(TC#600.3FOSSWM) 



Lee, Elaine P. Discovering the Problem of Solid Waste: Performance Assessments, 1991. 
Available from: Lake County Educational Service Center, 19525 W. Washington St., 
Grayslake, IL 60030, (708) 223-3400. 

In this booklet, 17 performance tasks are presented for students in grades 3-6. The tasks are 
based on an instructional manual used to teach the topic of solid waste, assess knowledge of 
the topic, and measure the ability to apply that knowledge in hands-on activities. Not all the 
tasks are appropriate for each of the grades. 

Each performance task contains information about grade level, concepts being assessed (e.g., 
types of solid waste or recognizing changes in materials in a landfill), process skills needed to 
complete the task (e.g., classifying, measuring, observing, or ordering), and the objects/items 
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needed for the task, directions, and questions to answer. Many of the tasks are completed at 
home or at a work station in the classroom. 

Scoring emphasizes the correctness of the response; the scoring guides are different for each 
task. The guide provides information on the maximum points to assign for each question and 
for the entire task. 

No information on staff training or technical information is provided. 
(TC#620.3DISPRS) 



Liftig, Inez Fugate, Bob Liftig, and Karen Eaker. Making Assessment Work: Wliat 

Teachers Should Know Before They Try It. Located In: Science Scope 15, March 1992, 
pp. 4-8* 

The authors contend that students have trouble taking alternative assessments because they 
have no practice doing do. For example, they don't know the higher-order thinking skills 
vocabulary that is often used in performance tasks, so they don't know what to do. They also 
don't know what it takes to do well. The authors recommend that students learn vocabulary, 
practice oral and written communication, and are careful not to leave anything out because 
they figure that the teacher already knows the student knows it. A list of vocabulary is 
included. 

(TC#600.6MAKASW) 



Lock, Roger. Gender and Practical Skill Performance in Science. Located in: Journal of 
Research in Science Teaching 29, 1992, pp. 227-241. 

This paper is not included here because of the results of the study of student gender 
differences in high school students. Rather, it is included because of its brief descriptions of 
the performance tasks used, procedures, and method of scoring student performances. The 
four tasks were: measuring the rate of movement of blow fly larvae in dry and damp 
atmospheres, finding out how the size of the container with which a burning candle is covered 
affects the length of time for which the candle burns, determining the mass supported by a 
drinking straw, and identifying an unknown solution. Only one of these (straws) is described 
in enough detail to replicate. There are separate performance criteria for each task. Student 
performance is assessed live by listening to what the student says while he or she does the 
task, by watching what the student does, and by looking at what the student writes down. 
The criteria for the unknown solution task are given. 

Because of the nature of the research reported, some technical information is included on the 
tasks. An attempt to obtain more information from the author was unsuccessful. 

(TO600.6GENPRS) 
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Lunetta, Vincent N., and Pinchas Tamir* Matching Lab Activities. Located in: The 
Science Teacher 46, May 1979, pp. 23-25. 



The authors list 24 skills and behaviors related to the scientific process, recommend using 
these skills to analyze the tasks given to students to make sure that students are being required 
to apply/use all the skills of importance, and report on a study in which they analyzed several 
tasks using the list. They discovered that most iab activities do not require students to use 
many of the skills on the list. 



(TC#600.6MATLAA) 



Macdonald Educational. Learning Through Science, 1989. Available from: Macdonald 
Educational, Wolsey House, Wolsey Road, Hemel Hempstead HP2 4SS, England, UK. 
Also available from: Teachers' Laboratory, Inc., PO Box 6480, Brattleboro, VT 05301, 
(802) 254-3457. 

This is one of a series of publications developed to promote instructional reform in science in 
the United Kingdom. The reform movement emphasizes active learning and concept 
development. (An overview of this curriculum reform movement is included in TC# 
600.6SCIASI--Wynne Harlen, Performance Jesting and Science Education in England and 

Wales.) 

In addition to sections covering such topics as "why do science" and how to organize 
instruction, one chapter covers record keeping. This chapter proposes keeping track of 
student development toward mastery of broad scientific concepts and habits of thought rather 
than keeping track of activities completed. The chapter provides a brief description of a rating 
procedure (presented in more detail in another publication) for 24 attributes such as: 
curiosity, perseverance, observing, problem solving, exploring, classifying, area, and time. A 
sample five-point rating scale for one of the attributes, curiosity, is given. 

An appendix to the book also provides developmental continua for: attitudes, exploring 
observations, logical thinking, devising experiments, acquiring knowledge, communicating, 
appreciating relationships, and critical interpretation of findings. These could be adapted for 
use in keeping track of student progress in a developmental fashion. 



(TC#600.6LEATHS) 



Macdonald Educational. With Objectives in Mind, 1984. Available from: Macdonald 
Educational, Wolsey House, Wolsey Road, Hemel Hempstead HP2 4SS, England, UK. 
Also available from: Teachers' Laboratory, Inc., PO Box 6480, Brattleboro, VT 05301, 
(802) 254-3457. 

This is one of a series of publications developed to promote instructional reform in science in 
the United Kingdom. This instructional reform emphasizes active learning and concept 
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development. (An overview of this curriculum reform movement is included in 600.6SCIASI- 
- Wynne Harlen, Performance Testing and Science Education in England and Wales.) 



This document covers such topics as the contribution of science to early education, objectives 
for children learning science, and how to use the various instructional units that have also been 
produced as part of this series. There is a good discussion of how student understanding in 
science develops, which includes many samples of student behavior as illustrations of the 
various stages. This discussion could be adapted to constructing developmental continua for 
tracking student progress to be used for performance assessment. 

(TC#600.6WITOBM) 

Marshall, G. Evaluation of Student Progress^ 1989. Located in: D. Holdzkom and P. Lutz 
(Eds.), Research Within Reach: Science Education , pp. 59-78. Available from: 
National Science Teachers Association, Special Publications Department, 
1742 Connecticut Ave. NW, Washington, DC 20009, (202) 328-5800. 

This paper presents a general overview of assessment development targeted at classroom 
teachers. The author emphasizes the need to clearly define outcomes for students and then 
match the outcome to the proper assessment technique— multiple-choice, essay, projects, 
practical tests and lab reports. Examples of each item type (using science content) are 



(TC#600.6EVASTP) 

Martinello, Marian L. Martinetto Open-ended Science Test (MOST), 1993. Available from: 
University of Texas at San Antonio, Division of Education, San Antonio, TX 78249, 
(210) 691-5403, fax: (210) 691-5848. 

This assessment is designed to be administered as a pretest and posttest of scientific 
observation, inference, and supporting evidence skills for children in grades 2-5. A child is 
given an unknown object to examine (e.g., a crinoid, sweet gum seedpod, oak gall) and is 
asked to respond to three specific questions: (1) What do you see? (2) What do you think it 
is? (3) Why do you think so? 

The test may be administered to individual children by soliciting oral responses or to class 
groups of children by soliciting written responses. All responses are open-ended. Responses 
are scored by assigning points for each reasonable observation made, inference made or piece 
of supporting evidence given by the student. 

The document includes a description of the general procedure and scored examples of student 
responses to "oakgalls" and "seed pods." Technical information is available from the author. 
Also, samples of student written responses are available. 
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Martinez, ML Figural Response in Science and Technology Testing, 199L Located in: G* 
Kulm and S, Malcom (Eds*), Science Assessment in the Service of Reform , 1991, 
pp. 384-390* Available from: American Association for the Advancement of Science, 
1333 H St NW, Washington, DC 20005 (AAAS Books: (301) 645-5643]. 

This paper briefly describes two field tests of "figural response" items. In figural response 
items students draw graphs, label diagrams, etc. They can be computer scored because the 
computer looks for the placement of key features in certain places on the answer sheet. For 
example, did the graphing extend up to the point expected? 

The first experiment involved field testing 25 items to determine their feasibility for the 
National Assessment of Educational Progress. The second experiment involved computer- 
delivered items in which features could be moved around on the screen. 

Several examples of items are provided. 

(TC#600.6FIGRES) 

McColskey, Wendy, and Rita (VSuIIivan. How to Assess Student Performance in Science: 
Going Beyond Multiple-Choice Tests, June 1993* Available from: SouthEastern 
Regional Vision for Education (SERVE), PO Box 5367 Greensboro, NC 27435, 
(800) 755-3277. 

This publication presents a nice, relatively short, summary of assessment possibilities and steps 
for developing assessments in science. There are sections that cover: deciding on student 
outcomes, matching outcomes to assessment type, developing performance criteria, and 
reflecting on grading practices. 

(TC#600.6HOWASS) 

Medrich, Elliott A M and Jeanne E, Griffith* International Mathematics and Science 
Assessments: What Have We Learned?, 1992* Available from: National Technical 
Information Service, US Dept. of Commerce, 5282 Port Royal Rd M Springfield, VA 
22161,(703) 487-4650. 

This report provides a description of international assessments of math and science (First 
International Mathematics and Science Studies, 1 960*5; Second International Mathematics and 
Science Studies, 1980's; and First International Assessment of Educational Progress, 1988), 
some of their findings, and the issues surrounding the collection and analysis of these data. It 
also offers suggestions about ways in which new data collection standards could improve the 
quality of the surveys and the utility of future reports. 



(TC# 000-6INTMAS) 
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Meinhard, Richard. A Developmental Baseline Profile of 12 Key Elementary Science 
Concepts/Processes, 1990. Available from: The Institute for Developmental Sciences, 
3957 E. Burnside, Portland, OR 97214, (503) 234-4600. 

The Oregon Cadre for Assistance to Teachers of Science, (OCATS) Developmental 
Assessment Project is designed to encourage concept/process-based science education in 
order to promote long range student growth in science. One part of this project was to gather 
information on how science concepts develop in students from kindergarten through grade 
five. The concepts were: 

Logical-mathematical organization of objects— simple classification, multiple classification, 
sedation, and whole number operations. 

2. Geometrical and spatial relationships of objects— perimeter, area, and multiplicative 
projective relationships. 

3. Physical properties of objects— quantity, weight, and volume. 

4. Experimental reasoning— controlling variables. 

5. Causal explanation— proportional reasoning. 

One performance task was given to the students for each concept area. Performance was 
rated using a holistic developmental scale with four stages: sensory-motor (student engages in 
the activity without representational thought of the activity), preoperational (intuitive, no real 
understanding), operational (conceptual understanding under some circumstances), and formal 
(concept used as a variable in a more complex system of explanatory reasoning). Each stage 
has two substages; a final scale has eight points. 

After discussing the results for the sample of 40 K-5 students in this study, the authors point 
out that the advantages of assessing students in this fashion are in knowing: 

1 . The readiness of students to handle instruction of certain types 

2. How to teach concepts to students in ways they can understand 

3. What needs to be done to move the student to higher developmental levels 

Neither the performance tasks nor the scoring techniques are described in detail in this paper 
No technical information, except distribution of performance, is included. 

(TOKT0.6DEVBAP) 
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Mergendoller, J-R M V.A. Marchman, A,L* Mitman, and MLJ. Packer Task Demands and 
Accountability in Middle-Grade Science Classes, 1987, Located in: Elementary School 
Journal 88, pp. 251-265. 



The authors maintain that the types of thinking students engage in and the quality of learning 
that occurs are largely influenced by the nature of the tasks students complete. After 
analyzing a large number of instructional and assessment tasks given to eighth graders, the 
authors conclude that, in general, the tasks given students present minimal cognitive demands. 
The article also provides suggestions about analyzing and modifying curriculum tasks. 

Although not strictly about assessment, the article is included here to reinforce the notion that, 
as in instruction, the task given to students in a performance assessment can affect how well 
one can draw conclusions about student ability to th;nk~if students are not given performance 
tasks that require thinking, it would be difficult to analyze responses for thinking ability. 

(TO600.6TASDEA) 

National Assessment of Educational Progress (NAEP--1987). Learning by Doing: A 

Manual for Teaching and Assessing Higher-Order Thinking in Science and Mathematics. 
Report No. 17-HOS-80, 1987, Available from: Educational Testing Service, CN 6710, 
Princeton, NJ 08541, (800) 223-0267. 

The National Assessment of Educational Progress was established in 1969 to monitor student 
achievement status and trends. Samples of students aged 9. 13 and 17 are tested periodically, 
with science assessments having occurred in 1970, 1773, 1982, 1986, and 1990. 

Learning by Doing is an overview of a pilot test of "higher-order thinking skills" that was 
added to the 1986 assessment. This pilot consisted of 30 tasks/items in the areas of 
sorting/classifying, observing/formulating hypotheses, interpreting data, and 
designing/conducting an experiment. The tasks included open-ended paper and pencil items, 
use of equipment at stations, and complete experiments. Learning by Doing briefly describes 
1 1 of the exercises presented to students. Scoring is not described in detail. (The full report 
is available from NAEP at the above address.) 

Lisa Hudson in chapter 4 of Assessment in the Service of Instruction (TC#600.6ASSINT) 
discusses some issues with respect to this pilot test and the 1990 science assessment. These 
include whether the time and cost of giving the performance items really provides that much 
extra information; how the a^ity to read, listen, and write might affect scores; and whether 
this type of task would differentially encourage inquiry-based instruction. (These are 
questions that relate to all performance assessments and not just the NAEP pilot.) 
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National Center for Improving Science Education. Getting Started in Science: A Blueprint 
for Elementary School Science Education, 1989. Available from: National Center for 
Improving Science Education, 2000 L St NW, Suite 602, Washington, DC 20036, 
(202)467-0652- Also available from ERIC: ED 314 238. 

This report covers such topics as the rationale for science instruction, how children learn 
science, teacher development and support, and assessment. The chapter on assessment 
promotes the idea of assessment in the service of instruction— measuring the full range of 
knowledge and skills required for science, alignment with instruction, and a range of 
assessment approaches. 

The authors outline the characteristics of a good assessment system, including characteristics 
of tests, measuring affective as well as cognitive dimensions, and assessing instruction and 
curriculum. 

(TC#600.6GETSTS) 

National Research Council. National Science Education Standards - Discussion Standards, 
May 1994. Available from: National Committee on Science Education Standards and 
Assessment (NCSESA), 2101 Constitution Ave., NW, HA 486, Washington, DC 20418, 
(202) 334-1399, fax (202) 334-3159, e-mail: scistnd@nas.edu. 

The National Science Education Standards are organized into five categories: program, 
teaching, content, assessment, and system. These describe what students should learn and 
how they should be taught and assessed. They are broken down by grade ranges (K-4, 5-8, 
9-12). Each range covers learning/teaching/assessing science and content standards: science 
as inquiry; physical science; life science; earth/space science; science and technology; science 
and societal challenges; and, lastly, history and nature of science. A final set of standards for 
K-12 relate to unifying concepts and processes. 

This is a discussion draft. 

(TC# 600.5NATSCE) 

National Science Teachers Association. Scope, Sequence and Coordination of Secondary 
School Science, Volume 1: The Content Core, A Guide for Curriculum Designers, 1992. 
Available from: The National Science Teachers Association, Special Publications Dept., 
1742 Connecticut Ave. NW, Washington, DC 20009, (202) 328-58W). 

This book outlines curriculum standards for secondary science (grades 6-12). The document 
emphasizes the need to do more than have students memorize facts, the philosophy that 
students need to be involved in the practical applications of science, the approach that the 
various subject areas need to be coordinated, the theory that all students need to be 
' scientifically literate, and the belief that students learn best when they construct their own 
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meaning. However, the scope, itself, concentrates mainly on the knowledge part of the 
curriculum. 

(TC#600.5SCOSEC) 

New York Department of Education. New York State Program Evaluation Test in Science, 
Grade 4, 1991. Available from: The University of the State of New York, The State 
Education Department, Albany, NY 12234. 

This entry includes two documents from the New York Department of Education. The one 
listed in the title provides an overview of the 1991 Grade 4 science test and complete 
descriptions of the performance tasks (materials needed, set-up and administration). There 
are five stations-Measuring Objects, Water on Objects, Grouping Objects, Electrical Testing, 
and Mystery Box. It takes students about an hour to circulate to all five stations. Scoring 
information is in another document that we do not have, therefore there is no indication of 
what skills these activities are attempting to measure. There is also no technical information 
nor sample student work. Mystery Box is cataloged as a separate document— see number 
below. 

The second document. Guide To Program Evaluation K-4 y describes a model for evaluating 
K-4 science programs. The components include: objective test, manipulative test, student 
science attitudes survey, analysis of instructional activities, and science program environment 
surveys. This document describes the system, the rationale for the system, and provides 
worksheets for reporting and using information. The actual surveys are included in another 
document we do not have. 

(TC#60(UNEWYOS AND 600.3MYSBOX) IN-HOUSE USE ONLY 

O'Rafferty, Maureen Helen* A Descriptive Analysis of Grade 9 Pupils in the United States 
on Practical Science Tasks, 1991. Available from: University Microfilms International 
Dissertation Services, 300 N. Zeeb Rd., Ann Arbor, MI 48106, (800) 521-0600, 
microfilm #913 5126. 

This dissertation was a re-analysis of some of the information from the Second International 
Science Assessment (SISS)(1986), but it also includes a good description of the performance 
portion of the SISS and three of the six performance tasks. (The SISS also contained a 
multiple-choice portion and several surveys.) 

The three tasks included in this document (Form B) were: determining the density of a sinker, 
chromatography observation and description, and identifying starch and sugar. The other 
three tasks (Form A) in the SISS, not included in the document, are: using a circuit tester, 
identifying solutions by ph, and identifying a solution containing starch. Each task has a series 
of questions for the student to answer using the equipment provided. (10-11 total). The 
questions asked students to observe, calculate, plan and carry out a simple experiment, 
explain, and determine results. Each subquestion was classified as being one of three types of 
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process skills: performing, reasoning, or investigating. The six tasks were set up at 12 
alternating stations A, B, A, B, ...). Twelve students could be tested every 45 minutes. 

One to two points were given for each answer. The basis for assigning points was not clear, 
but appears to be based on a judgment of the correctness of the response. 

The dissertation includes a number of student responses to the tasks, overall performance of 
the US population, and several ^interpretations of the results. For example, student 
performance on questions classified as measuring the same skill were widely different. The 
author speculates that this is either because the definitions of the skills are imprecise, or 
because such unitary skills don't exist. 

The author also examined student responses for patterns of errors, and discussed the 
implications of this for instruction. 

(TC#600.3DESANP) 

Ostlund, Karen. Sizing Up Social Skills. Located in: Science Scope 15, March 1992, 
pp. 31-33. 

The author presents a taxonomy of social skills important for the science classroom, provides 
a few ideas for how to teach them, and offers a couple of ideas on student and teacher 
monitoring techniques. 

(TC#223.6SIZUPS) 

Padian, Kevin. Improving Science Teaching: The Textbook Problem. Located in: 
Skeptical Inquirer 17, Summer 1993, pp. 388-393. 

Although not strictly about assessment, this article is included because it discusses the nature 
of the tasks and activities that we give students to do. One of the major points of the article is 
that giving students "hands-on" activities doesn't ensure "good" activities. If we don't craft 
our tasks to get at the heart of what we want to accomplish with students, the tasks will be 
worthless both as instruction and assessment tools. 

(TC#600.6IMPSCT) 

Pine, Jerry, Gail Baxter, and Richard J. Shavelson. Assessments for Hands-On Elementary 
Science Curricula^ 1991. Available from: Physics Department, California Institute of 
Technology, Pasadena, CA 91125, (818) 356-6811. 

The authors present the case that science curriculum should enable students to learn how to 
pursue an experimental inquiry, and should give them the ability to construct new knowledge 
from their observations. Assessment should match this but the authors question whether it is 
always necessary to have hands-on assessment tasks. The authors designed a study that 
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compared observer rating of fifth- and sixth-grade student performance of hands-on tasks with 
five other surrogates: ratings of student lab notebooks that covered the same hands-on tasks, 
a computer simulation of the tasks, free-response paper and pencil questions, multiple-choice 
items, and California Test of Basic Skills (CTBS) scores. The surrogates (with the exception 
of the CTBS) were designed to parallel the hands-on tasks as closely as possible. 

This paper reports on the relationship of observer ratings, notebook ratings, simulations, and 
CTBS scores. Results showed: 

1 It was possible to get consistent ratings of student performance on hands-on tasks with 
trained observers 

2 Ratings of lab notebooks were a promising surrogate for observations, but they have to be 
designed carefully 

3 Computer simulations, open-ended questions, and multiple-choice questions were not good 
surrogates 

4 CTBS scores were moderately related to hands-on performance, but appeared to mainly 
reflect general verbal and numerical skills 

5. In order to assess inquiry instruction rather than general natural ability, hands-on tasks need 
to be carefully designed 

The paper briefly describes all the tasks used in the study, but does not present them in 
enough detail to replicate. A companion paper, New technologies for Large-Scale Science 
Assessments: Instruments of Educational Re form (TC#600.3NEWTEF), describes the tasks 
in more detail. 

(T0600.3ASSFOH) 

Pomeroy, Deborah* Implications of Teachers* Beliefs About the Nature of Science: 
Comparison of the Reliefs of Scientists, Secondary Science Teachers, and Elementary 
Science Teachers. Located in: Science Education 77, June 1993, pp. 261-278, 

The author reports on a study that asked the question: "Are there differences between how 
scientists and teachers view the nature of science, scientific methodology, and related aspects 
of science education?" She developed a 50-item survey which covered: (1) the nature of 
scientific inquiry-is the only valid way of gaining scientific knowledge through inductive 
methods using controlled experimentation, or is there a role, as more contemporary views 
have it, for dreaming, intuition, play, and inexplicable leaps? (2) what K-12 science education 
should be like, and (3) background information on respondents. 

The complete survey and discussion of the results are included in the article 

(TO600.4TEABEA) 



ERIC 



NWKKL. August 1 904 

Test Center -(503) 275-9582 



Science 



Psychological Corporation* Integrated Assessment System—Science Performance 

Assessment, 1992. Available from: Psychological Corporation, Order Service Center, 
PO Box 839954, San Antonio, TX 78283, (800) 228-0752. 



This is a series of seven tasks designed to be used with students in grades 2-8 (one task per 
grade level). The tasks involve designing and conducting an experiment based on a problem 
situation presented in the test. Students are provided various materials with which to work. 
Students may work individually or in teams, but all submitted products must be individually 
generated. Students generate a hypothesis they wish to test, write down (or show using 
pictures) the procedures used in the experiment, record data, and draw conclusions. At the 
end, students are asked to reflect on what they did and answer questions such as: "What 
problem did you try to solve?" "Tell why you think things worked the way they did," and 
"What have you seen or done that reminds you of what you have learned in the experiment?" 
The final question in the booklet asks students how they view science. This question is not 
scored but can be used to gain insight into students' performances. 

Only the written product in the answer booklet is actually scored. (However, the publisher 
recommends that teachers watch the students as they conduct the experiment to obtain 
information about process. A checklist of things to watch for is provided.) Responses can be 
scored either holistically or analytically using criteria which have been generalized so that they 
can be used with any task. The holistic scale (0-6) focuses on an overall judgment of the 
performance based on quality of work, conceptual understanding, logical reasoning, and 
ability to communicate what was done. 

The four analytical traits are experimenting (ability to state a clear problem, and then design 
and carry out a good experiment), collecting data (precise and relevant observations), drawing 
conclusions (good conclusions supported by data), and communicating (use of appropriate 
scientific terms, and an understandable presentation of what was done.). Traits are scored on 
a scale of 1-4. 

There is a scoring guide that describes the procedure. However, in the materials we obtained, 
there are no student performances provided to illustrate the scoring. No technical information 
about the assessment is included. 
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Psychological Corporation. GOALS: A Performance-Based Measure of Achievement- 
Science, 1992. Available from: Psychological Corporation, Order Service Center, 
PO Box 839954, San Antonio, TX 78283, (800) 228-0752. 

GOALS is a series of open-response questions (only one right answer) that can be used alone 
or in conjunction with the MAT-7 and SAT-8. Three forms are available for 1 1 levels of the 
test covering grades 1-12 for each of science, math, social studies, language and reading. 
Each test (except language) has ten items. On the science test, tasks cover content from the 
biological, physical, and earth/space sciences. Each task seems to address the ability to use a 
discrete science process skill (e.g., draw a conclusion, record data) or use a piece of scientific 
information. The tasks require students to answer a question and then (usually) provide an 
explanation 

Responses are scored on a four-point holistic scale (0-3) which emphasizes the degree of 
correctness or plausibility of the response and the clarity of the explanation. A generalized 
scoring guide is applied to specific questions by illustrating what a 3, 2, 1 and 0 response look 
like. 

Both norm-referenced and criterion-referenced (how students look on specific concepts) score 
reports are available. Scoring can be done either by the publisher or locally. A full line of 
report types (individual, summary, etc.) are available. 

The materials we obtained did not furnish any technical information about the test itself. 
<TC#610.3GOALSS) 

Raize Senta and J. Kaser. Assessing Science Learning in Elementary School: Why, Wltat, 
and How? Located in: Phi Delta Kappan , May 1989, pp. 718-722. 

This paper describes some of the limitations of current standardized, multiple-choice tests to 
assess science, discusses how this combines with inadequate teacher preparation and 
textbooks to create inferior science instruction, and provides a list of questions to ask about 
any test being considered for use. The list of questions includes such things as "Are problems 
with more than one correct solution included?" and "Are there assessment exercises that 
encourage students to estimate their answers and to check their results?" 

(TO/600.6ASSSCL) 

Raizen, Senta A., Joan B. Baron, Audrey B. Champagne, et al. Assessment in Elementary 
School Science Education, 1989. Available from: The National Center for Improving 
Science Education, 2000 L St. NW, Suite 602, Washington, DC 20036, (202) 467-0652. 
Also available from: ERIC ED 314 236. 

The authors discuss the following topics: why assessment is important, issues in assessment, 
what to assess, how to assess, using assessment in instruction, and assessment of program 
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features. The emphasis is on using assessment to enhance instruction not to undermine it. A 
lengthy appendix describes "fundamental organizing concepts in science that all students, by 
the time they finish sixth grade, should incorporate in the way they think about and engage 
their world." These include: orderliness, cause and effect, systems, scale, models, change, 
structure and function, variations, and diversity. There is a definition of each area and 
examples of K-6 instructional activities. 

This appears to be a longer and more detailed version of TC# 600.6GETSTS, National Center 
for Improving Science Education. 

(TC#600.6ASSELS) 



Riggs, Iris M. and Larry G. Enochs. Toward the Development of an Elementary Teacher's 
Science Teaching Efficacy Belief Instrument, 1989. Available from: ERIC ED 308 068. 

Paper presented at the 62nd Annual Meeting of the National Association for Research in 
Science Teaching, San Francisco, CA. 

This publication reports on a study in which the Personal Science Teaching Efficacy Belief 
Scale and the Science Teaching Outcome Expectancy Scale were administered to measure 
teacher feelings of self-efficacy and outcome expectancy. The authors present evidence that 
the combined instrument is valid for studying elementary teacher's beliefs toward science 
teaching and learning. The instrument is included. 

(TC#600.4TOWDEE) 



Riverside Publishing Company. The. Performance assessments for ITHS, TAP and II El) 
(various levels and subject areas)* 1993. Available from: The Riverside Publishing 
Company, 8420 Bryn Mawr Ave., Chicago, IL 60631, (800) 323-9540. 

Riverside is publishing a series of open-response items in the areas of social studies, science, 
mathematics, and language arts. Nine levels are available for grades 1-12. They supplement 
achievement test batteries available from the publisher: ITBS, TAP, and ITED. Each level 
uses a scenario to generate a series of related questions, some of which have only one right 
answer, and others of which are more open-ended and generative. 

For example, the science assessments we received center around designing a biology display 
for a local museum (high school) and exploring the web of life (elementary). The biology 
assessment has students design and use classification systems for living things, draw a bar 
graph based on presented information, generalize about muscles, and show knowledge about 
the brain. Tests take I V 2 to 2 hours depending on grade level. 
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No information about scoring, sample student performances, nor technical information was 
included in the materials we received. However, the publishers' catalog indicates that scoring 
materials are available and that the tests are normed. 

(TC# 060.3PERAST)~IN-HOUSE USE ONLY 

Roth, Wolff-Michael. Dynamic Evaluation. Located in: Science Scone 15, March 1992, 
pp. 37-40. 

The author describes a method by which students plan and report experiments: the Vee Map. 
The Vee Map requires students to list vocabulary related to the topic they are reporting, 
develop a concept map of these terms, describe the experimental design, describe the data 
collected, and present their conclusions One extended example in earth science is given. 
Performance criteria for assessing the Vee Map is sketchy. No technical information is 
included 

(TC#630.6 D YN EVA ) 



Rutherford, F. James and Andrew Ahlgren, Science for All Americans— Science Literacy, 
1990. Available from: Oxford University Press, Inc., 200 Madison Ave., New York, 
NY 10016,(800)334-4249. 

This book discusses science curriculum standards The premise is that, although not everyone 
will be a scientist, future success of humanity requires that everyone have a certain level of 
scientific literacy-knowledge, habits of mind, and the desire to be a critical thinker. The 
chapters cover the following kinds of goals we should have for students; the scientific 
endeavor as a human enterprise, basic knowledge about the world, major scientific themes, 
and habits of mind. 

(T0600.5SCIFOA) 



Scottish Examination Board. Standard Grade - Amended Arrangements in Biology, 1992. 
Available from: Dr. David M. Elliot, Director of Assessment, Ironmills Rd., Dalkeith, 
Midlothian, Edinburgh, EH22 1LE, Scotland, UK, (031) 663-6601. 

The Scottish Examination Board prepares end-of-course tests for a variety of high school 
subjects to certify level of student competence. We have received tests for math, general 
science, and biology. The course syllabus for biology calls for coverage of: the biosphere, the 
world of plants, animal survival, investigating cells, the body in action, inheritance, and 
biotechnology. The goals of the course are. knowledge and understanding, problem solving, 
practical abilities, and attitudes. (Only the first three are assessed.) There are two main parts 
to the assessment for biology-written tests (developed by the Examination Board) and 
classroom embedded performance assessments (conducted by teachers according to 
specifications developed by the Examination Board). The two parts are combined to rate 
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student competence. Each goal is rated on a scale of 1-5, overall performance is rated on a 
scale of 1-7 (1 being highest). 

Written tests, developed each year, cover knowledge/understanding and problem solving in 
the content areas outlined in the syllabus. Two levels of the test are available: General and 
Credit. Students getting about 50 percent right on the General level obtain a rating of 4; 
about 80 percent right gives a rating of 3. Likewise a score of about 50 percent on the Credit 
test gives a rating of 2, while 80 percent gives a rating of 1 . All questions are short answer or 
multiple-choice and are scored by degree of correctness of the answer. 

The performance assessments cover techniques (students must demonstrate competence in ten 
areas such as "carrying out a test for starch") and investigations (students are scored for 
"generative skills," "experimentation skills," "evaluation skills," and "recording and reporting 
skills" on each of two investigations). Scoring entails assigning points for various specified 
features of performance, such as 2 points for "producing a table of results with suitable 
headings and units of measurement." 

The package of materials we received included the course syllabus, specifications for the 
written and performance assessments, and copies of the written tests for 1993. It did not 
include technical information or sample student responses. 

(TC# 6403BIOSTG) 

Scottish Examination Board. Standard Grade - Amended Arrangements in Science, 1992. 
Available from: Dr. David M. Elliot, Director of Assessment, Ironmills Rd., Dalkeith, 
Midlothian, Ediburgh, Scotland, EH22 1LE. 

The Scottish Examination Board prepares end-of-course tests for a variety of high school 
subjects to certify level of student competence. We have received tests for math, general 
science, and biology. The course syllabus for general science calls for coverage of: healthy 
and safe living, an introduction to materials, energy and its uses, and a study of environments. 
Goals are knowledge, problem solving, practical abilities (science process skills), and 
attitudes. (Only the first three arc assessed.) There are two main parts to the assessment for 
general science— written tests (developed by the Examination Board) and classroom embedded 
performance assessments (conducted by teachers according to specifications developed by the 
Examination Board). The two parts are combined to rate student competence on a scale of 1- 
7. (Separate ratings are given overall and for each of the three goals.) 

Written tests, developed each year, cover knowledge/understanding and problem solving in 
the content areas outlined in the syllabus. Three levels of the test are available: Foundation, 
General, and Credit. Students getting about 50 percent right on the Foundation level obtain a 
rating of 6; about 80 percent right gives a rating of 5. Likewise, percent right on the General 
level give ratings of 4 or 3, and percent right on the Credit level give ratings of 2 or 1 . ("1" is 
the highest rating.) All questions are short answer or multiple-choice and are scored for 
degree of correctness of the answer 
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The performance assessments cover techniques (students must demonstrate competence in 
eight areas such as "measuring ph") and investigations (students are scored for "generative 
skills " "experimentation skills " "evaluation skills" and "recording and reporting skills" on 
each of two investigations). Scoring entails assigning points for various specified features of 
performance, such as 2 points for "clearly identifying the purpose of the investigation in terms 
of the relevant variables. " 

The package of materials we received included the course syllabus, specifications for the 
written and performance assessments, and copies of the written tests for 1993. It did not 
include technical information or sample student responses. 

(T(#610.3SCISTG) 

Semple, Brian McLean. Performance Assessment: An International Experiment, 1992. 
Available from: ETS, Scottish Office, Education Department, Rosedale Rd M 
Princeton, NJ 08541, (609) 734-5686. 

This report describes the Second International Assessment of Educational Progress on math 
and science conducted in 1991 . Eight math and eight science tasks were given to a sample of 
thirteen-year-olds in five volunteer countries (Canada, England, Scotland, USSR, and 
Taiwan). This sample was drawn from the larger sample involved in the main assessment. 

The 10 hands-on tasks are arranged in two 8-station circuits. Students spend about five 
minutes at each station performing a short task. Most tasks are "atomistic" in nature; they 
measure one small skill. For example, the 8 math tasks concentrate on measuring length, 
angles, and area, laving out a template on a piece of paper to maximize the number of shapes 
obtained, producing given figures from triangular cut-outs, etc. Some tasks require students 
to provide an explanation of what they did. All 16 tasks are included in this document, 
although some instructions are abbreviated and some diagrams are reduced in size. 

Most scoring appears to be right/wrong. (However, it is not entirely clear how the 
explanations are scored. It consists of some kind of judgment of reasonableness of the 
explanation.) There must also have been some observation of how the students approached 
the tasks, b jcause a detailed analysis of such strategies for one problem is given. 

Student summary statistics on each task are included. There is a brief summary of teacher 
reactions, student reactions, the relationship between student performance on various tasks, 
and the relationship between performance on the multiple-choice and performance portions of 
the test 

(TC#600.3PERASS) 
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Semple, Brian McLean. Science - Assessment of Achievement Programme, 1992. Available 
from: Scottish Office Library, New St. Andrews House, Room 4/5 la, Edinburgh, EH1 
3SY, Scotland, UK, (031) 244-4388. 

The '"Assessment of Achievement Programme (AAP)" was established by the Scottish Office 
of Education Department in 1981 to monitor the performance of pupils in grades 4, 7, and 9. 
This document reports on the 1990 science assessment. The assessment focused on science 
process skills: observing, measuring, handling information, using knowledge, using simple 
procedures, inferring, and investigating. 

Assessment tasks used two formats: written (select the correct answer and provide a reason 
for the choice); and practical (use manipulatives to select the correct answer and provide a 
reason, or longer investigations such as observe an event and write down the observation). 
The practical portion was set up in ( 1) circuits of eight stations (four minutes at each station), 
or (2) longer investigations of 1 5-30 minutes. Schools in the assessment sample were also 
invited to comment on the types of skills assessed, and describe the science program at their 
schools. 

Detailed scoring guides are not provided in the materials we have. Student responses were 
apparently scored for both the correctness of the answer and the adequacy of the explanation 

The document we have describes the background of the assessment program, provides sample 
written and practical tasks for each skill area assessed, and describes student performance on 
the tasks (by grade level and gender, and over time). Neither technical information nor sample 
student performances are included 

(TC# 600.3SCAASA) 



Shaveison, Richard J., Neil B. Carey, and Noreen M. Webb. Indicators of Science 
Achievement: Options for a Powerful Policy Instrument. Located in: Phi Delta 
Kappan , May 1990, pp. 692-697. 

The authors review reasons for moving from multiple-choice tests of science achievement to 
more performance-based measures, and then discuss three examples: looking at how well 
students can move between different representation of a problem, mental models, and 
performance assessments/surrogates. 

(TC#600.61NDOFS) 



ERIC 



NWRKL, August 1994 48 L - . Science 

Tost Center - (503) 275-95X2 <J £ 



Shavelson, Richard J,, Gail P. Baxter, Jerry Pine, and J, Yure. New Technologies for 
Large-Scale Science Assessments: Instruments of Educational Reform, 1991, Available 
from: University of California, 552 University Rd., Santa Barbara, CA 93106, 
(805) 893-8000. 

This document is a series of papers that report in more detail on the studies of hands-on 
versus surrogate assessment tasks also described in Assessments for Hands-On Elementary 
Science Curricula (TC#600.3ASSFOH). This includes more detailed descriptions of the three 
hands-on tasks (paper towels, sow bugs, and electric mysteries) and computer simulations. 
Findings, in addition to those reported in the companion paper, include: 

1 Although observers could be trained to be very consistent in their ratings, a major source of 
error is still in the tasks chosen. That is, the decision about the level of an individual's 
performance depends greatly on the particular task used. 

2 Hands-on assessment provides different information than that provided by paper and pencil 

tests. 

For additional information to those reported in this paper and its companion paper see the 
following references. 

Baxter. Gail P., Richard J. Shavelson, Susan Goldman, and Jerry Pine. Evaluation of 
Procedure-Base J Scoring for Hands-on Science Assessment. Journal of Educational 
Measurement, 1992, 29, pp. 1-17. (TC#600.3EVAPRB) 

Shavelson, Richard J., and Gail P. Baxter. What We've Learned About Assessing Hands-On 
Science, Located in: Ed ucation al Leadership , Vol. 49, No. 8, May 1992, pp. 20-25. 
(TC//600.3WHAWEL) 

Shavelson, Richard J., Gail P. Baxter, and Jerry Pine. Performance Assessments-Political 
Rhetoric and Measurement Reality. Located in: Educa t ional Researcher , Vol. 21, No. 4, 
May 1W2, pp 22-27. (TC//600 3 PER ASP) 

Shavelson. Richard J., Maria Araceli Ruiz-Primo, Gail P. Baxter. On the Stability of 
Performance Assessments. Located in: Journ al of Educational Measurement , Spring 1993, 
30. pp 41-53. (TC#600.6ONSTAP) 

(TC#600.3NEWTEF) 
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Small, Larry. Science Process Evaluation Model, 1992. Available from: Schaumburg 

Community C onsolidated District #54, 524 E. Schaumburg Rd M Schaumburg, IL 60194, 
(708) 885-6700. 

This document contains a paper presented at a national conference in 1988 which briefly 
describes Schaumburg's science assessment system, and a set of tests for students in grades 
4-6. The tests have three parts: multiple-choice to measure content and some process skills, 
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self-report survey to assess attitudes toward science, and hands-on tasks to assess science 
process skills. 

The hands-on part attempts to measure 1 1 student science process skills: observing, 
communicating, classifying, using numbers, measuring, inferring, predicting, jontrolling 
variables, defining operationally, interpreting data, and experimenting. It consists of students 
using manipulatives to answer fixed questions such as "Which drop magnifies the most'*" or 
"Which clay boat would hold the most weights and still float in the water'*" Students respond 
by choosing an answer (multiple-choice), supplying a short answer, or, in a few cases, 
drawing a picture or graph. Complete tests for Grades 4, 5, and 6 are included 

No scoring procedures or technical information were included with the package For 
additional information on this project see Teamw ork Testing (Small--TC#G50 3TI - ATI-S) 

(TC#600.3SCIPRE) 



Small, Larry, and Jane Petrek. Teamwork Testing. Located in: Science Scone 15. 
March 1992, pp. 29-30. 

The authors describe a model for performance-based assessment in middle school chemistrx 
which emphasizes group cooperation and the process of doing science One task was 
described in detail. Performance criteria were hinted at, but not described 

For other information on this project see Science Process Evaluation Model (Small- 
TC#600.3SCIFRE). 



(TC#65(L3TEATES) 



Stecher, Brian M. Describing Secondary Curriculum in Mathematics and Science: ( urrent 
Status and Future Indicators, 1992. Available from: RAND, 1700 Main St., 
PO Box 2138, Santa Monica, CA 90407. 

The author describes what could go into an indicator system of the health of science and 
mathematics education. He concludes that current data sources for these indicators are 
inadequate. 

(TC#000.6DESSEC) 



Surber, John R.. Map Tests (various documents). Available from: John R. Surher, 
Department of Educational Psychology, University of Wisconsin, Milwaukee, 
WI 53201,(414)229-1122. 

This is a collection of the following four documents. 
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Surber, John R Mapping as a Testing and Diagnostic Device. Located in: C. D. Holley 
and D F. Dansereau (Eds.), S patial Learning Strategies , 1984, pp. 213-233. Available 
from: Academic Press, Inc., 1250 6th Ave., San Diego, CA 92101. 

Surber, John R., Philip L. Smith. Testing for Misunderstanding Located in Educational 
Psychologist 16, 1981, pp. 165-174. 

Surber, John R., Philip L Smith, and Frederika Harper. Technical Report No. I. 

Structural Maps of Text as a Learning Assessment Technique: Progress Report for 
Phase I (undated). 

Surber, John R., Philip L. Smith, and Frederika Harper. Technical Report No. 6. The 
Relationship Betw een Map Tests and Multiple Choice Tests, March 1982. 

These reports describe the development of map tests as an assessment technique to identify 
conceptual misunderstandings that occur when students learn from text. In this testing 
technique, concepts and their interrelationships are represented graphically. These graphic 
representations are called text maps. A training manual for constructing text maps is included. 
The manu?l introduces the symbols to be used in the concept map to indicate: 1) definitions, 
2) characteristic:: or properties, 3) examples, 4) temporal relations, 5) causal relations, 6) • 
similarity, and 7) greater- or less-than comparisons. 

The papers present four methods of using maps to assess the structure of student knowledge. 
All involve various levels of deleting information from a completed text map and providing 
clues on content and structure. Students complete the missing information-similar to a cloze 
test. Text maps and map tests can be constructed using any content area-science, social 
studies, etc. They can be used in study skills or reading classes. In these reports, the content 
of the training manual is drawn from chemistry and study skills. 

(TC #150.6MAPTES) 

irgas, Elena Maldonado and Hector Joel Alvarez. Mapping Out Students' Abilities. 
Located in: Science Scope 15, March 1992, pp. 41-43. 

The authors use concept maps to assess the knowledge structures students have on various 
concepts in science. They give some brief help on how to design a concept map, and more 
extensive help on how to score maps. Two examples are given: matter and photosynthesis. 
(See also John Surber, TCU 150.6MAPTES) 

(TC06OO.6MAPOUS) 
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Whetton, Chris, Marian Sainsbury, Steve Hopkins, et ai. National Assessment in England 
and Walesa 1992. Available from: National Foundation for Educational Research 
(NFER), The Mere, Upton Park, Slough, Berks, SI I 2DQ, England, UK. 

This document is a series of papers presented at the American Educational Research 
Association meeting in 1992. It updates the status of the science assessment described in 
other entries for Whetton: Science for Seven-Year Olds (TC#600.6SCIFOS), The Pilot Study 
of Standard Assessment Tasks for Key Stage I (TC#070.3STAASTm~ in-house use only), 
and Standard Assessment Tasks for Key Stage I (TC# 1 00. 3'ST A AST— in-house use only). 
For additional information see Harlen (TC#600.6PERTES). 

The papers review the history of the assessment, describe and present a few examples of the 
assessment tasks for seven-year-olds, discuss the support needed to assist teachers to 
administer this large a number of performance tasks, describe the changes that resulted for the 
1992 assessment, and briefly describe plans for the 14-year-old assessment. 

(TC#600.6NATASE) 



Whetton, Chris. Science for Seven-Year-Olds in England and Wales* 1991. Available from: 
National Foundations for Educational Research, The Mere, Upton Park, Slough, Berks 
S112DQ, England, UK. 

This paper reports on the development in England and Wales of performance assessments that 
are tied to their new national curriculum for the United Kingdom. In spring 1091, all seven- 
year-olds (600,000) were tested This paper discusses the pilot that was carried out in 1990 
and the changes made for the 1991 assessment. Although this paper addresses all subject 
areas, the examples are selected from the science portion of the test. 

Student performance was noted on over 200 "standards of achievement" observed during a 
series of specified performance tasks. In addition to these tasks, students also had a "science 
interview" to assess knowledge of specific facts. 

Due to the pilot test, the full scale assessment for 1901 was modified so that: 

1. Fewer attainment targets will be noted; 200 separate judgments were too many for teachers 
to make. 

2. Not all attainment targets will be noted for each child; teachers will choose targets based 
previous assessment results. 

3. Certain "core" targets will be covered for all students. In addition, one extra target in 
science and math will be selected for each student. 

4. Each task will focus on only one or two attainment targets 

5. Science interviews have been abandoned. 
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A related document, The Pilot Study of Standard Assessment Tasks for Key Stage I 
(TC#070 3STAASTm) contains a complete description and analysis of the pilot, and Standard 
Assessment Tasks for Key Stage 1 (TC#600.3STAAST~in-house use only), contains the 
complete 1991 assessment package for all content areas. For additional information see other 
entries for Whetton and Harlen (TC#600.6PERTES). 

(T0600.6SCIFOS) 



Whetton, Chris, G. Ruddock, Steve Hopkins, et al. The Pilot Study of Standard Assessment 
Tasks for Key Stage /, 1991. Available from: National Foundations for Educational 
Research, The Mere, Upton Park, Slough, Berks Sll 2DQ, England, UK. 

This set of two reports describes the pilot test of the age-7 performance tests in England in 
more detail than that reported in Science for Seven-Year-Olds in England and Wales 
(TC#600.6SCIFOS). For other information see additional entries for Whetton and Harlen 
(TC#600 6PERTES). 

(TC#070.3STAASTIV1--1N-HOUSE USE ONLY) 

Whetton, Chris, G. Ruddock, Steve Hopkins, et al. Standard Assessment Tasks for Key 
Stage /, 1991. Available from: National Foundations for Educational Research, The 
Mere, Upton Park, Slough, Berks Sll 2DQ, England, UK. 

This package contains all the materials used by teachers for the age 7 Standard Assessment 
7av£.v--administration handbooks, detailed description of tasks and scoring procedures, 
information recording booklets, and student worksheets. For related information see other 
entries from Whetton and Harlen (TC#600.6PERTES.) 

(TC# 1 00.3STA AST— IN-HOUSE USE ONLY) 



Wiggins, Grant. The Futility of Trying to Teach Everything of Importance. Located in: 
Educational Leadership, November 1989, pp. 44-48, 57-59. 

Assessment has to reflect what we value. This article presents a philosophy for science 
instruction that has implications for assessment. Specifically, the author maintains that the 
goal of education should not be to teach every fact that we think students will need to know, 
because this will be impossible to do. Rather, we should concentrate on developing those 
habits oi mind and high standards of craftsmanship that will enable students to be lifelong 
learners and critical thinkers The article briefly mentions some of the implications for 
assessment of this philosophy 

(TC#600.6FUTTRT) 
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Williams, Susan E., Hersholt Waxman, and Juanita Copley. Calculator Mathematics 
Curriculum Assessment, undated Available from: University of Houston, College of 
Education, Curriculum and Instruction Dept., Houston, TX 77204, (713) 743-9870. 



These observation checklists were designed to collect research data pertaining to the use of 
calculators in secondary mathematics classes. The instruments focus on the quality of 
calculator instruction. Student and teacher behaviors are recorded on a checklist about ten 
times per item per classroom period. General areas assessed include teacher/student 
interactions, environment, management of time and students, activities, materials, content, 
instructional strategies, and specific classroom applications of calculators. Assessment is 
administered by the researcher while observing teachers conducting mathematics lessons. 
Instruments are available for observing the use of fraction, scientific, and graphing calculators. 
At this time, the assessment instrument is in the exploratory stage, though it has been 
successfully piloted. This document includes only the observation forms; neither 
summarization and interpretation nor technical information is included. 

(TO 500.4CALMAC) 



Yager, Robert E. and Alan J. McCormack. Assessing Teaching/Learning Successes in 

Multiple Domains of Science and Science Education. Located in: Science Education 73, 
1989, pp. 45-58. 

This article describes the authors' view of the proper targets for instruction in science 
(knowing and understanding, exploring and discovering, imagining and creating, feeling and 
valuing, and using and applying), goes on to describe the STS (Science-Technology-Society) 
approach to teaching science, and then lists some tests (mostly multiple-choice) that attempt 
to measure the targets. The paper is included on this bibliography mainly for the first two 
points. 

(TC#600.5ASSTEL) 



Yee, Gary, and Michael Kirst. Lessons from the New Science Curriculum of the 1950s and 
1960s. Located in: Education and Urban Society 26, February 1994, pp. 158-171. 

The title of this article says it all— what we need to do differently in the current round of 
content standards to avoid the problems of the past. 

(TC# 600.5LESFRN) 
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Index Codes 



Type 

1 - Example 

2 ■ Theory/how to assess/rationale for 

alternative assessment 

3 = Content/what should be assessed 

4 Related: general assessment; 
program evaluation; results of 
studies; technology; attitudes 

Purpose for the Assessment 

1 : Large scale 

2 Classroom 

3 - Research 

- Grade Levels 

1 Pre K-K 

2 I 3 

3 4-0 

4 - 7-9 

5 10-12 
b Adult 

7 Special education 

8 - All 

9 Other 

- Content Covered 

1 General science 

2 Biology 

3 ~ Chemistry 

4 Physics 

5 ■ Earth/Space Science 
b Other 

7 All/Any 



E - Type of Tasks 

1 = Enhanced multiple choice 

2 = Constructed response: short 

answers 

3 = Long response/essay 

4 = On-demand 

5 = Project 

6 = Portfolio 

7 - Group 

8 = Other than written 

9 = Cognitive map 

F - Skills Assessed 

1 = Knowledge/conceptual 

understanding 

2 = Application of concepts 

3 = Persuasion 

4 = Critical thinking/problem solving; 

reasoning/decision making 

5 = Group process skills 

6 = Quality of writing/communication 

7 = Student self-reflection 

8 = Process 

9 = Comprehension 

G - Type of Scoring 

1 = Task specific 

2 = General 

3 = Holistic 

4 = Analytical Trait 
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